THE SCIENTIFIC STUDY OF 
EDUCATIONAL PROBLEMS 


BY 

WALTER S. MONROE 

PROFESSOR OP EDUCATION" AND DIRECTOR, BUREAU OF 
EDUCATIONAL RESEARCH, UNIVERSITY OF ILLINOIS 

AND 

MAX D. ENGELHAET 

DIRECTOR, DEPARTMENT OF EXAMINATIONS, 
CHICAGO CITY JUNIOR COLLEGES 


NEW YORK 

THE MACMILLAN COMPANY 
1936 



COPYEIGIIT, 1036, 

By the MACAniJAN COAIPANY 

ALL RIGHTH RBhLRVl.D—NO P VKT OF THIH BOOK MAY BE 
KKPKODtU'LI) IN' AN^ I 0«M 'WTrHOin PEUAHSHlOJSr IN WUIPINO 
I'ltOM Till PlTHLIrtHLH, rXCEIT BY A lUIVllAVEll WHO WIHIILS 
TO QtTOTI HUinr PAHM\OI''H IN OONNl.C'I'IONr WITH A BFVIBW 

‘wurrrnN pou in<t^iihion in MAfK^iNK oh nkwhi'afeh 
Pvihl’hhod Ei'oibbIb'i, 1036 


SET VP ANB ELBCTBOTYPED BY T MOEET Ae SON 

: Printed m the United States of America : 



PREFACE 


This volume is addressed to a varied audience It is intended 
to provide a basic text for graduate students in education and 
others who are interested in learning how to study educational 
problems scientifically It is mtended to serve also as a source 
of information for research workers A third group consists of 
the consumers of educational research. These include superin- 
tendents, principals, supervisors, teachers, and others who 
endeavor to ascertain what research has accomplished in the 
field of education and to interpret the findings with respect to 
theory or practice. This large consumer group is addressed 
for the purpose of engendering attitudes and other abilities 
necessary for reading intelligently reports of studies in educa- 
tional periodicals and other publications. Recogmtion of these 
three groups has created problems in determining the content 
which would not have aiisen if the volume had been addressed 
to a more homogeneous audience. The degree of wisdom that 
the authors have exercised in the selection of topics and in the 
treatment of them can be determined only by those who make 
use of the results of their efforts 

It seemed wise to include a fairly comprehensive treatment 
of the statistical techniques employed in educational research 
For those most likely to be used by graduate students and 
other persons interested in attempting studies in the field of 
education, the treatment is sufficiently detailed so that it should 
not be necessary to consult other sources For the less com- 
monly employed techniques, the treatment is somewhat ab- 
breviated, but references are given to sources from which ad- 
ditional information may be secured Considerable space is 
given to the interpretation of statistics, a topic that has not 
received adequate attention m texts on educational statistics 
It is hoped that the volume may find a place as a text for courses 
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in statistics as well as courses devoted to the study of educa- 
tional reseaich 

The voluim^ is (aitical Educational research is a new field 
and, as is to be ('Xficeted, its tc'ctiniques have been only im- 
perfectly inukmstood by many of those who hav(' canployed 
them In our enthusiasm in at/t(‘m])tmg to dc'al with e<lueational 
problems scientifically, many errors have been made By (*alling 
attention to the limitations of edu(‘ati()nal data, and by reveal- 
ing difficulties encountered m educational research, the authors 
hope they have succeeded m presenting an adequate picture 
of the situation and thereby have provided a basis for more 
constructive research endeavors. The authors have faith in 
the possibilities of educational research and they hope the 
reader will develop an intelligent ojitimism. 

In view of the number of texts on educ^ational statistics and 
the attiHupts that have been ma<l<‘ to deserilx^ educational re- 
sc^anih, a lu^w vc'uIaiix^ in this fk^Id should justify itself as a (‘con- 
tribution. The (waluation of th(' present volume will be made 
by thos(' who examines its pages syst('mati(‘ally But tlie au- 
thors may be permitted the piivilege of mentioning certain 
features that tluy believe make it distinctive^ The mterpn'ta- 
tion of th(^ findings of research in tlu^ light of tlunr dcqxmdability 
has been emphasized by dinaiting attemtion to the limitations 
of educational data and of tlu^ techniques employed in handling 
them Descriptions of statistical techniciues may b(^ found in 
other writings but the compeU^nt reailcr who examin(‘s the 
volume carefully will note a mimlx^r of dc^tails, espcnaally m 
Chapters X and XI that arc not generally known. Another 
feature is the comprehenBivenoss of the volume which is in- 
dicated by the topical index. The limitations of space prohibit 
more than a brief mention of a number of the topicvs listed, but 
in most such cases references arc giv(m so that the interested 
reader may consult other sources The organization of the 
volume, together with this index, should make it an effective 
source of information Finally, the authors have attempted, 
especially in the concluding chapter, to indicate the types of 
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studies to which workers must direct their efforts if a science 
of education is to be built up * 

It will be obvious to the reader that the authors.are indebted 
to many research workers and other writers in this field. They 
have attempted to acknowledge this indebtedness by numerous 
references to sources, but these citations do not adequately 
indicate the extent to which their thinking has been stimulated 
and guided by other persons. The limitations of space prohibit 
listing the names of this large group, but the senior author 
desires to make a general acknowledgment of his mdebtedness 
to his students, especially those on the graduate level, and to 
the members of the staff of the Bureau of Educational Research 
during the period, 1921 to 1931. Specific mention should be 
made of Dr P. T Grata and Dr D B. Stuit, both of whom read 
portions of the manuscript and of Dr. Harold Gulliksen and 
Mr. L. R. Tucker who were consulted in regard to factor 
analysis. Several findings from the dissertation of Dr. Stuit 
are reproduced in Chapter XI. Finally, the authors are grateful 
to Miss Neva M. Covey for her intelligent typing and checking 
of the manuscript. 

Walter S Monroe 
Max D. Engelhart 

September, 1936 
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EDITOR’S INTRODUCTION 


It is a widely recognized truth that the characteristics of 
adult society are dependent more largely upon the education 
given to its children and to its youths than upon any other sin- 
gle factor. Consequently, nothing, certainly, can be of greater 
concern to society than genuinely sound educational thinking. 
The methods of thinking which have revealed what is known 
about the structure of the umverse through astronomy, physics, 
chemistry, and biology, have m recent years been apphed to 
the problems of education with ever increasing fruitfulness. 

Professor Monroe and Dr. Engelhart in The Scientific Study 
of Educational Problems have put these fruitful procedures into 
readily understandable terms in order that they may be learned 
and utilized by the greatest possible number of educational 
workers The authors have endeavored to cast the material in 
the form in which educators approach their own subject matter, 
rather than to arrange it in some abstract order which might 
conceal scientific processes in educational thinking from all but 
the most highly trained technicians. 

It IS the particular hope of the authors and of the editor that 
not only will those persons who go forward to discover new 
truths m education be stimulated and assisted by this volume, 
but also that the group of educators ultimately most important, 
the administrators and the teachers who actually deal with 
children and with young people, will receive substantial help 
from this volume in understanding and interpreting the in- 
vestigations of research workers. Scientific data in education 
would have little, if any, significance if they were to remain 
only on shelves of research monographs; they become of im- 
portance to society when they are finally utilized by a wise 
teacher to improve the experiences and enrich the opportunities 
of her pupils. 


PURDtJE UnIVEESITY 
September, 1936 


Harriet E. O’Shea 




THE SCIENTIFIC STUDY OF 
EDUCATIONAL PROBLEMS 


CHAPTER I 
INTRODUCTION 

The meaning of educational research. A person confronted 
with a question may accept the first answer that occurs to him; 
he may derive an answer from previous experience or casual 
observation, he may consult other persons in regard to the 
answer; or he may seek his answer to the question in the pub- 
lished opinions of others. Such means of answering educational 
* questions are not those of educational research In contrast 
educational research may be thought of as the total procedure 
employed in collecting, handling, and interpreting data for the 
purpose of arriving at dependable answers to questions about 
education ^ 

Sometimes the question for which an answer is sought is so 
simple that the total activity of answering it involves little more 
than collecting the required data and applying simple arith- 
metical techniques For example, suppose a city superintendent 
wishes to ascertain the grade enrollments of his system He 
asks his teachers to report the number of pupils enrolled with 
a designation of their grade classification The numbers reported 
are summarized and the totals constitute the answer to the 
question If such activity is labeled “educational research, 
it is obvious that the term will have a very general meaning. 

1 This statement requires modification when educational research is regarded 
as inclusive of the philosophical type of inquiry The conclusions of philo- 
sophical educational research are in the nature of decisions with respect to 
*‘what should be ” For further discussion, see Chapter XII 

1 



2 STUDY OF EDUCATIONAL PROBLEMS 

Examination of educational wntingn rovc^als t,hat although 
the term is not applied in such eases, the designation of ^^edu- 
cational rcs^'arch/^ is frcKpumtly given to n^lativcdy simple 
and loutine inv(\stigations su(*h as surveys of (uimmt conditions 
or practices This custom is unfortunate Tlu^ usage of the 
term should bo limited so that r(\s(^arch in (aluc‘ntion will be 
comparable in its oss(mtuil (‘haracteristics with resc'arch in 
other fields, 

Illustratioiis of educational research. Before attempting to 
point out the essential characteristics of educational research, 
a few studies will be described briefly. If feasible, the reader 
should consult the references cited and study the complete 
reports for a more adequate understanding of the nature of 
these rcwsearchcs 

L The Department of f>upenntendcnee dmly of the status of 
the superintend c7d ^ The probUnns of this survey investigation 
wore as tollows: ^‘(1) To dc'termiiK^ i.lu^ status of the super- 
intciidout of si^hools with ref(nxHi(‘e to training, oxperien(‘.e, 
and tenure; (2) To deU^rmine the fa<‘i»s regarding the financial 
compensation of tlu^ superinteiuk'ut of s(‘hools; (3) To deter- 
mine the professional activities m which the snpi'rintendent 
of schools is engagivl; (4) To (hdi^rmiiH^ as far as possible the 
economic status of the superintendent-; (5) To (h^tcTinine the 
interrelationships betwee^n elements mentioned above/^ 

Data were collected chiefly by means of a (iu(\stionnaire 
mailed to city superintendents under the auspices of a (*ommittee 
of the Department of Supermtendemee of the National Educa- 
tion Association. The questionnaire is characterizcxl by ques- 
tions asking for facts rather than expressions of opinion. Returns 
were received from 1181 superintendents — approximately 40 
per cent of the superintendents in cities over 2500 in population 
in the United States Some additional data were collected 
from the Educational Directory of the United States Bureau 

1 Chadaey, C E,, et al “The Status of the Supermteudent,” First Yearbook 
of the Department of Superintendence, Washington. National Education As- 
sociation, 1923. 206 pp 

nhid,p, 10 
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of Education. Although the statistical techniques employed m 
summarizing the data are very simple, the number of ques- 
tionnaire returns was so large that mechanical me^ns of tabula- 
tion were employed. 

2. The Judd and Buswell study of the nature of eye-movements 
m silent reading ‘ The problem of this laboratory study was 
to determine the characteristics of the eye-movements made 
in certain types of reading. The data were collected by securing 
a photographic record of the eye-movements of a number of 
subjects while engaging in reading silently vanous types of 
material and m reading the same material in response to various 
types of requests ® From the photographic record on a moving 
picture film the investigators were able to determine the follow- 
ing facts for each line of the text read (1) number of eye-move- 
ments, (2) their direction (forward or regressive), and (3) the 
length of each p('nod of fixation 

In the Judd and Buswell study photographic records were 
secured for various typos ol silent reading — rapid, superficial 
reading, slow, careful reading preparatory to answering ques- 
tions, reading when difficult words are encountered; reading 
an easy poem; reading a passage to be reproduced; reading 
with grammatical analysis; reading a foreign language The 
numerical data derived from the photographic records were 
assembled to show the variations m the performance of a sub- 
ject when engaging m the different types of reading and to show 
the performances of different subjects for a given type of reading. 

It should bo noted that the data collected are highly objective, 

1 Judd, C H , and BuBwell, G T ‘'Silent Reading A Study of the Various 

Typos,” 8 ti 2 )plemvntary Educational Monographs, No 23 Chicago University 
of Chicago, 1022 160 pp 

2 The apparatus used for secuiing the photographic record is somewhat com- 
plicated and need not bo described here In reading the movement of the eyes is 
not a smooth rotation from the left to the right but a senes of short rapid move- 
ments and brief pauses or periods of fixation Furthermore, there may be some 
movements from right to left which are known as regressive 

For a description of the apparatus see 

Gray, C T “Types of Reading Ability as Exhibited through Tests and 
Laboratory Experiments,” Supplementary Educational Monographs, Vol 1, No 5, 
Chicago University of Chicago, 1917, pp 83-91. 
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that 18 , there was very little opportunity for them to be influ- 
encc^d by any prijjudiccvs or desirra of the inv(\stip;ator8 Anotheu' 
poison usmg^tho same subjects and following the same procedure 
would have secuHxl appioximahdy the same data The data 
are also highly accurahi and valuL It may b(^ noted also that 
only simple arithmetical techniques wm) ('inploycnl in ha,ndling 
the data. 

3 The Newark phomes experiment ^ Tlic^ jiroblem of tins 
controlled experiment was to determine tlie relative effective- 
ness of teaching beginning reading with instruction in phonies 
and without instruction m phonics. The chaiacteristic of the 
study is the employment of the expeiimental technique in 
collecting the data One group of pupils was given instruction 
in phomes and another group with nearly ichmtical status with 
res})ect to reading aliility was instructcHl without training in 
phonics. At the end of fiv(^ months four xx^ading t((\si,s were 
administered to all jnipils The ]m\)\h w('ro test(Kl again as 
th(‘y complcited tlu^ work of grade lA and again as they com- 
pleted IIB. The s(K)res obtaimnl by administering thc^se tests 
constitute the data of th<' (‘xperinu^nt Oolloidhig the data, 
howevei, involved more than mendy administering tests. 
Setting up the experiment so that th(^ experimental and control 
groups would be equivalent in reading ability and (conducting 
the study so that the other non-exp('rim(mtal fa(‘tors would 
bo adequately controlled are essential phases of the pro(^(\ss 
Differences in average gains in reading ability of the ('xpcTi- 
mental group and the control group were computed The sta- 
tistical procedures, however, are relatively simple 

4, A prediction study ^ Douglass has reported a study in which 
he sought to ascertain the value of high school recjord, intelli- 
gence test score, and certain other data for predicting the success 

^Seseton, E. K., axxd Herron, J S ‘‘The Newark Phomes l^xporimcnt,” 
EUmeritoxy School Journal^ 28 690-701, May, 1928 

2 Douglass, H R '‘The Relation of High School Preparation and Certain 
Other Factors to Academic Success at the XJmversity of Oregon, UmD&rsily of 
Oregon Pubheation^ Vol 3, No 1 EugCne, Oregon University of Oregon 
Press, 1031 61 pp 
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of studeuts entering college and to derive a formula for using 
tlic data shown to be valuable. Most of the data were copied 
from records on file in the registrar’s office at the IJniversity of 
Oregon Ciiriain information was obtained from the state educa- 
tional dir<‘(5tory An intelligence test was administered to secure 
a measure of the intelligcaice of the entrants In studying the 
data bold partial coi relation and multiple correlation were 
employe<l Hen(x\ the statistical procedures in this study were 
more intricate than those employed in the three preceding ones 

5 The Ilcilman study of the relative influence of certain heredi- 
tary and environmental factors on educational achievement ^ The 
problem of this investigation was that of determining “the rela- 
tive influence upon scholastic achievement of mental age, 
school attendance, and socio-economic status of the home 
The probkan has also been stated by Heilman as that of deter- 
mining ^Svhich of three factors, mental age, school attendance, 
or th(^ s()(‘io-economic status of the home, had the greatest 
weight in producing individual differences in the educational 
age of t(m-year-old children 

Th(i following data were collected for 828 ten-year-old 
children: (1) Educational age in months; (2) Mental age in 
months, (3) Life ag('. in months; (4) School attendance in days 
for each grack* (and for tlio kindergarten); (5) A measure of 
the socio-('(‘-onomic status of the home; (6) Date of entering 
the first grad('. (and of the kindergarten for those children who 
had kindergaitcn training) Educational ages were computed 
from scores on the Stanford Achievement Test Mental ages 
of the children weie secured by employing the Stanford Re- 
vision of the Binet Intelligence Scale Age and attendance 
data for the children were obtained from the records of the 

1 Heilman, J D. “The Relative Influence upon Educational Achievement of 
Some Hereditary and Environmental Factors,” Twervty-Seventh Yearbook of the 
National Society for the Study of Education, Part 11 Bloomington, Illinois 
Public School Publishing Company, 1928, pp 35-65 

2 Ibid , p 35 

3 Heilman, J D. “Factors Determining Achievement and Grade Location,” 
Pedagogical Seminary and Journal of Genetic Psychology, 36 435, September, 
1929 
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Denver schools Data pertaining to the socjio-oconomic status 
of the homes of the children wercobtained by means of a revision 
of the Chapman-Sims Socio-economic Scale. 

In handling his data Heilman found it necessary to postulate 
the directions of causation. On a priori grounds h(^ inferred that 
educational age is directly influcnc(xl by school attcuidancc, 
mental age, and socio-c(‘,onomic status. He assumed also that 
mental age indirectly influences educational age through its 
influence on socio-economic status and on school attendance; 
that socio-economic status indirectly influences educational 
age through school attendance, and that no significant recipro- 
cal influences operated. In the statistical treatment of his data 
Heilman computed coefficients of correlation between educa- 
tional age and mental age, educational age and socio-economic 
status, educational age and chronological age, and so on for 
each of the possible combinations of the paired sots of measures 
He then applied the tecihmquo of imrtial (‘-orndaiion as a means 
of eliminating chronological age as a factor. Thc^ method of 
^'path coefficients” devised by Wright^ was introduced as a 
means of determining a measure of the relative (contributions 
of the remaining factors to individual diffenmec^s m the educa- 
tional age of tcn-year-old school (diikhen 

6. The Hockett study to determine the more significant political^ 
economic^ and social problems of American life} This study by 
Hockett was made for the purpose of determining what political, 
social, and economic problems and issues of American life about 
which children should become able to think intelligently. As 
a basis for answering this question, which asks what should 
be, he assumed that in a commonwealth such as ours the (dti^iens 
need to understand the society in which they live and to be 
able to deal with the problems which they arc likely to face. 

1 Wright, Sewall Correlation and Causation,” Journal of Agricultural 
Research, 20 567-85, January, 1921, 

® Hockett, J A “A Determination of the Major Social Problems of American 
Life,” Teachers College, Columbia University Contributions to Education, No. 281 
New York Bureau of Publications, Teachers College, Columbia University, 
1927 101 pp 
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He further postulated that a consensus of expert opinion would 
afford the best answer to this question His first step, therefore, 
was to determine the persons who might properly be recognized 
as “frontier thinkers” and their writings which represent “in 
the highest degree penetrating insight and critical analysis of 
contemporary life and problems ” In this step of his work 
Hockett sought the help of “ 150 specialists in the field of gov- 
ernment, economics, sociology, law, the press, international 
affairs, immigration, geology, anthropology, and the field of 
artistic expression ” With the aid of the information contrib- 
uted by these specialists he selected twenty-two books for 
a.nalysis. 

The analysis of these twenty-two volumes was, of course, 
subjective, but Hockott’s procedure was highly systematic 
and it is probable that the list of 396 issues which he identified 
constitutes a satisfactory determination of the consensus of 
opinion of the authors ot the books analyzed. Hockett sought 
to validate the results of the analysis by a study of the reported 
events of Ameririan life over a period of five years In identify- 
ing these events he analyzed the weekly summary of events 
under the heading of “current events” in the Literary Digest 
and the editorial comment on events appearing in the Outlook, 
the Independent, the New Republic, and the Nation. 

Characteristics of educational research revealed in these 
illustrations. These illustrations of educational research reveal 
certain significant characteristics. In each case there is a 
problem expressible in the form of one or more questions and 
this problem when adequately defined served as a guide in 
collecting the needed data and m interpreting them Although 
the nature of the data and the techniques employed in collect- 
ing them varied, the procedure may be desenbed as systematic. 
Careful examination of the reports reveals that m each case 
there was a distinct effort to secure as accurate data as possible 
With the exception of Hockett’s study the method of collectmg 
data was such that there was httle or no opportumty for them 
to be infliuenced by any prejudice or preconceived opimon of 
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the investigator. Hockett^s method of collecting data was 
subjective, but ho shows that another analyst using his tcxdmique 
on a portioiuof the source material obtained v(uy similar results 
Although objective techniques for collecting data are desirable 
they are not essential The essential requirement is that the 
process of collecting be systematic and that the resulting data 
be as accurate and as valid as possible. 

Another characteristic, which is only suggested by the brief 
descriptions, is the attention to the faults of the data m inter- 
preting them. Careful reading of the reports, however, makes 
clear that the several investigators recognized the faults of their 
data and endeavored to interpret the findings m accordance 
with these faults This characteristic is perhaps most apparent 
in the Newark phonics experiment 

In each of these studios the investigator desalt wii/h a limited 
collection of data, but ho was interest^ed in a probh'm more 
general than that indiiuited by his data. Judd and Buswell 
wore not interested in th(' eye-movements of tlie particular sub- 
jects studied except as they wen^ indi(‘ative of the (ye-move- 
ments of readers in general Sc'xton and IIcuTon wca’o intia‘ested 
in the findings of the Newark phonics expe^riment as a basis 
for advising teachers in regard to the instructional procjedures 
in teaching reading in the primary grades. Heilman sought 
measures of the relative influence of (jcrtain facitors upon the 
achievement of a particular group of pupils as a means of 
generalizing concerning the factors contributing to school 
achievement Hence, we may note as a characteristic of educa- 
tional research the interpretation of the findings as indicative 
of general conditions and relationships. Not infrequently the 
generalization is limited to a relatively restricted population 
or otherwise qualified and sometimes it is pointed out that 
generalization is perhaps not justified, but an investigation 
whose findings are applicable only to the particular population 
studied does not typify educational research. In fact, it may 
be maintained that such inquiries do not meet the requirements 
of educational research. 



INTRODUCTION 


9 


The characiteristicH of educational research may be epitomized 
as follows; 

1. An investigation designed to afford a basis for genMizing with 
reference to educational theory or practice. 

2. A problem whoso definition serves as a guide m collecting the 
needed data. 

3. Wisely planned and systematic collecting of data. 

4. Critical interpretation of data with attention to their faults and 
any limitations implied by underlying assumptions. 

The first of these characteristics is somewhat indefinite and 
hence difficult to apply, but it is suggested as a means of dis- 
tinguishing between investigations that justify the designation 
of research and those that may more appropriately be called 
service studies or applications of research techniques in adminis- 
tration and teaching Logically the definition of the problem 
is the first step in research. Frequently, the definition of the 
problem is modified and extended as the work progresses 
Usually, however, a well defined problem may be regarded as 
a charact(Uisiic of educational research 

Sometimes we encounter the impression that educational 
research is largely a routine procedure and hence that when 
appropriate techniques arc employed m collecting data the 
statistical treatment of them will yield the answers to the 
questions being studied. It is true that frequently there is much 
routine in connection with these two phases of educational 
research, but the total procedure cannot be described in terms 
of any definite texhniques. It is essential that the collecting of 
the data bo wisely planned with reference to the problem and 
that the process be systematic. With reference to the statistical 
treatment of the data, no criterion can be cited except the 
general one that the techniques applied be appropriate to the 
problem and to the data. In the studies described in the pre- 
ceding pages, the techniques employed vary. In the Hockett 
study and in the Judd-Buswell study only the simplest of 
statistical techniques wore employed. On the other hand, the 
Heilman study of the influence of certain hereditary and en- 
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viroiimental factors on educational achievement called for the 
relatively elaborate statistical treatment including the applica- 
tion of the method of path coefficients 

If the designation of ^^poor rosearelU^ is admitted, the fourth 
characteristic is possibly more of a criterion of the quality of 
the work than of rc^search, but it reefers to an essential phase 
of the total procedure of ^^good resear(‘h It is essential that 
a research worker bo cognisant of the faults of his data and 
that he make due allowance for these faults in interpreting 
these findings It is also essential that he be cognisant of the 
assumptions that may be implied by the problem or by the 
technique employed.^ 

The plan of the following chapters. Chapter II deals with 
research problems and their definition m keeping with the 
postulate that “all educational rcnsearcdi begins with a problem 
to be solved/' Chaptois 111, IV, and V deal with standard 
methods of collecting data, standard methods of haudlmg data, 
and the faults of data and their effects Thc^ siiuk^nt shoiikl 
acquire from his study of Chapters 11, HI, IV, atid V a knowl- 
edge of the general procedures of cdiKjational rcvsc'arcdi from 
the selection and definition ol a iirobknn through the colk'ction 
and handling of data to their interpretation m formulating 
conclusions and generalizations 

Chapters VI to XII differ from the prcc(Kling chapter’s in that 
each IS devoted to a gtmoral type of r(\soarch probkmi. While 
one purpose of these chapters is to provide information with 

1 These include fundamental assumptions such as that children hnirn or that 
human traits arc sudiciently stable to make moasurement of them nioaningful 
Assumptions of this typo are essentially postulates and aio to bo distinguished 
from those introduced when a particular procedure is employed The (‘al(*ula- 
tion of a mean from a frequency distribution introduces the assumption that the 
mid-pomt of an interval is the average of the measures withm it A comparison 
of the mean score of a class with other means or with announced nouns intro- 
duces the assumption of comparability The use of a probable error formula in 
general assumes the sample to which it is applied is a random one The inter- 
ested reader may consult the Index for references to the treatment of assump- 
tions in later chapters He should also read Gray, J. 8 “Scientific Postulates 
for Educational Research,” Educational Administration arid Supervision, 19 
18-24, January, 1933, and Scates, D E “Types of Assumptions in Educa- 
tional Research,” Journal of Educatioml Psychology, 26. 350-66, May, 1935 
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respect to the techniques appropriate for the several types of 
problems, each of these chapters has the additional purpose of 
presenting an indication of what has been accomplished by 
educational research in the field being considered 

Chapter XIII presents information with respect to the 
evaluation and summarization of educational research. 

In Chapter XIV consideration is given to the progress that 
research workers have made toward a science of education and 
an attempt is made to point out the general limitations of current 
research and to indicate the lines along which efforts should be 
directed if a science of education is to be developed. 

A list of statistical symbols is given as an appendix. The 
reader will find it helpful to consult this list when he encounters 
difficulty in understanding the symbolism in the following 
chapters. 

Suggestions for using the volume as a text. The volume is 
somewhat encyclopedic in character so that it will be useful 
as a source of information An instructor using it as a text 
will probably find that the objectives of the course suggest the 
omission of certain sections For example, if the course is de- 
signed primarily to engender an elementary acquaintance 
with statistical procedure and ability to interpret the resulting 
statistics, most of the matter in Chapters II, III, VI, XII, XIII, 
and XIV would probably be omitted. If the purpose of the 
course is to engender an understanding of educational research 
and an ability to read reports of statistical studies intelligently, 
the omission of the more technical topics may be advisable, 
especially those in Chapters X and XI If the students have 
previously had a course in statistical methods, Chapter IV may 
be omitted If educational measurements is dealt with in a 
separate course, Chapter VII may be omitted. 
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EDUCATIONAL PROBLEMS AND THEIR 
DEFINITION 

The educational problems being studied. The educational 
problems to which research workers are giving their attention 
are indicated by the titles of reports appearing in such peri- 
odicals as Journal of Educational Research^ Journal of Educa- 
tional Psychology y Educational Administration and Supervision^ 
and School Review. They are possibly more typically represented 
by lists of graduate theses in education The following titles were 
taken at random from the 1929-1930 Bibliography of Research 
Studi(\s published by the United States Office of Education. 

1 . A survey of the public schools of Imperial County, California 

2. The educational ideas of Louisa May Alcoti 

3. The comparative influence of motion 2 iictures in teaching 
American history 

4. Analysis of integration, a study of the relationship between 
eye, hand, and foot response mechanisms 

5. A study of the relation between developmental age and some 
physical measurements 

6 A critical study of standardized mechanical aptitude tests 

7. Bureaus of re>search in public school systems with reference to 
cities of 10,000 population or less 

8 Kinesthetic factors in the learning of reading and spelling 

9. A study of certain sound letter confusions in spelhng in grades 
two to SIX 

10. Relationship of reading ability and success m high school 
English m the junior class of the Milne high school 

11. How literary artists of the nineteenth century were influenced 
by current psychology and philosophy m delineating children 

12. What skills m mathematics are necessary in order that a student 
may do the mathematics required by some colleges in the first 
year of a course leading to a B A degree 

13. Some number abilities of beginners in rural and town schools 

13 
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14 A course in general science 

15. An evaluation of ceitam standard tests in high school physics 

16. Methods in the teaching ol high school history 

17. A study of the dominant characteristics of adolescent children 
having superior untrained musical talent 

18 A study of the clothing weights and physical activity together 
with the possible correlation of these in the Merrill-Palmer 
nursery school 

19 State high school standardization 

20 Relationship of scores obtained by junior high school pupils in 
the Rogers physical fitness tests to their mental ability and 
achievement 

Some of these titles are not very definitive but the list is 
indicative of the wide range of problems being studied m the 
field commonly designated as educational research This field 
has no definite boundaries. It obviously overlaps that of 
psychology and sociology. Theie an' octcasional excursions 
into other fields A to])ical index of approximately 3650 ix'porls 
of edu(‘.ational research, and redated matc'rials, for the pc'riod 
1918-1927, not including articles in periodicals, included G05 
items exclusive of duplicaiions It may be unfortunate that 
educational research has not been confined to a more limited 
field, but a more rcstric'.ted definition of th(^ field would not be 
in conformity with provaihug practice. 

A classification of the problems of education. The picture 
of the field of educational rcsearcdi afforded by the preceding 
paragraphs is general. Several writers have attempted a more 
meaningful description by listing types of educational research, 
but there is little agreement in the classifications proposed. 
The diversity of view is illustrated in “A Symposium on th(^ 
Classification of Educational Research’’ published m the 
J ournal of Educational Research^ May and June, 1 931 Whitney ^ 
has tabulated the rubrics in twelve classifications. No rubric of 
the twenty-two that appear in the table was recognized by all 
twelve of the authors and only two rubrics (experimental and 
survey) were listed by as many as eleven of the authors. 

^Whitney, F L Methods %n Educational Research New York D Appleton 
and Company, 1931, pp 72--73. 
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One explanation of this lack of agreement in these classifica- 
tions is to be found in the different points of view from which 
analyses have been made ^ Most writers have attempted 
differentiation on the basis of techniques employed. This 
approach leads to difficulty. It has been asserted that “all 
research involves the use of statistical methods.’’^ A classifica- 
tion of problems according to the field of investigation results 
in a bewildering array of rubrics unless the fields® are very 
broad and such an analysis is not very useful For the purpose 
of this text, an analysis on the basis of the type of question or 
problem, is proposed.^ Questions asking what has been (his- 
torical) and questions asking concerning the status of present 
conditions and practices (survey) are obvious as general types 
A number of questions relate to the measurement of human 
abilities and traits Studies of relationship are grouped under 
three heads The determination of what should be or the answer- 
ing of questions of value represents another type of problem, 
and the evaluation and summarization of research is listed as a 
final type ® 

1. Historical 

2. Measurement (Construction and validation of measuring in- 
struments) 

3. Survey (Determination of status of conditions and practices) 

4. Experimental (Determination of relative effectiveness of com- 
parable procedures) 

5. Concomitant variation (Prediction) 

^ Freeman, F N “A Symposium on the Classification of Educational Re- 
search/' Journal of Educational Research, 24 16-19, June, 1931 

2 Bymonds, PM “A Course in the Techmque of Educational Research,” 
Teachers College Record, 29 24-30, October, 1927 

® Ayer has proposed fifteen fields of administrative research Ayer, P C 
“Administrative Research in Public Administration,” The Nation’s Schools, 
2 14, September, 1928 

^ The organization of Chapters VI-XIII of this text conforms to this classifica- 
tion The reader should consult these chapters for further explanation of the 
several rubrics 

^ The reader should note that there is no specific mention of problems relating 
to the development of statistical procedures Most of such problems occur 
incidentally in dealing with the types designated by the second, fourth, fifth, 
and sixth rubrics 
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6 Causal (Identification of causes and measurement of relative 
contributions) 

7 Determination of values, or of what should be 

8. Evaluation and summarization of research 

Relative importance of the several rubrics of problems. If a 
group of supenntendents, principals, and teachers wcic asked 
to list the problems that they consider important, a large pro- 
portion of the questions would belong under the head of survey 
research. The practical schoolman is interested in the status 
of his school system and of the several units within it. He 
desires to know concerning practices and conditions in other 
schools Beyond such inquiries his interests are scattered 
Sometimes he asks concerning the relative merits of alternative 
procedures and practices Occasionally he is interested in causes 
of certain conditions and the effects of certain procedures. 
If, however, we consider the question of the relative importance 
from the point of view of a science of educat/ion which is con- 
ceived of as a systematized collection of g(uicral facts and 
principles, the answer will be different Survey investigations, 
which are frequently characterized as fact-finding studies, will 
be assigned a place of minor importance. 

The problems fundamental to a science of education A Since a 
large portion of educational research involves the measurement 
of human abilities and traits, the problems of measurement 
are fundamental and the most fundamental arc those having 
to do with the identification and specification of the abilities 
and traits whose measurement is desired Some progress has 
been made toward determining what general intelligence tests 
measure and there is some information concerning the nature 
of achievement in the various subject-matter fields. For example, 
it has been shown that what many achievement tests measure 
includes as a major factor the same ability that is measured 

^The problems noted here imply certain assumptions. For example, the 
problem of measurement assumes a sufficient degree of stability m human 
abilities and traits to make measures of them meaningful The identification 
and formulation of such assumptions might be included as one of the problems 
fundamental to a science of education 
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by typical tests of general intelligence. We commonly speak 
of mathematical ability but a ‘recent investigation^ indicates 
that there is no general mathematical ability. Several inves- 
tigators have interpreted their data as indicating that ability 
in the field of arithmetical calculation is highly specific, but the 
total evidence is not consistent and the identification of the 
abilities that function in arithmetical calculation is still a prob- 
lem. Another problem under this head involves the question 
of permanency of achievement. When a measure of achievement 
is secured it is conceived of in terms of future performance, but 
the date of this performance is rarely specified We need to know 
how rapidly an ability will deteriorate during the period follow- 
ing the date of testing and how to predict the performance 
it will make possible at any specified date. 

A second fundamental problem is the identification of 
causes Questions under this head ask, ''What are the causes of 
"Does . contribute to . . . 

"What are the factors that contribute to . An 

important problem under this rubric is to determine the factors 
(instructional techniques, size of class, etc ) that affect learmng. 

A third problem may be expressed as follows: Given a rela- 
tionship involving as causes such factors as instructional 
techniques, the personality of the teacher, size of class, and 
general organization of the school, to determine the effect upon 
the dependent variable of specified changes in a designated 
cause. The dependent variable is usually a measure of a 
specified segment of the achievement of a group of subjects 
The mean of the measures of the achievement is most commonly 
used, but the dependent variable may be some other function of 
the distribution. The problem of the effect of training (practice) 
upon individual differences is an illustration of the use of a 
measure of variability An important special case under this 
general problem is formed when the varying independent vari- 

1 Cairns, George J "An Analytical Study of Mathematical Abilities,” 
Catholic University of America^ Educational Research Monographs^ Vol 6, No. 3, 
Washington, D. C * Catholic Education Press, April, 1931 104 pp 
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able IS time as m the constancy of the IQ and other genetic 
problems. 

The measurement of the contnbutions of given causes 
furnishes a fourth fundamental problem. Closely related is the 
problem of detci mining the status of the various factors which 
constitutes the optimum conditions for engendering specified 
achievements At present we have hypotheses <*-onceining these 
optimum conditions The advocates of the activity curriculum 
have proposed a general description of the conditions which they 
insist will result m maximum learning Other groups of educa- 
tional theorists proclaim confidence in other conditions But our 
actual knowledge concerning optimum conditions for learning is 
exceedingly fragmentary and until we have a dependable deter- 
mination, our study of many of the problems relating to the 
training and selection of teachers, and the organization and 
admimstiation of our schools will not have a secure foundation. 

The identification of the factors having a predictive value 
and the determination of the best basis for making desired 
predictions within specified populations furnishes a fifth funda- 
mental problem 

The sixth fundamental problem is the determination of the 
objectives of education. The questions involved ask what 
should pupils learn or what })upil achievements should be 
desired. Some writers, especially those with training in the 
field of philosophy, contend that those questions are outside of 
the realm of educational research. It is true that the answers 
cannot be developed entirely from objective data, and this fact 
has been recognized by certain writers who list philosophical in- 
quiry as a type of educational research Whether philosophical re- 
search is or is not recognized as a type is not important, but it is 
important to realize that the determination of the goals of educa- 
tion is a fundamental problem. Until these goals are specified in 
terms of measurable pupil controls of conduct, any determination 
of optimum learning conditions, which include methods and 
materials of instruction and the organization and administration 
of our schools, will rest on assumptions or arbitrary postulates. 
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In this exposition of the problems fundamental to a science 
of education there has been no. mention of the population for 
which they should be studied. With the exception of those 
relating to measurement, most of the questions asked should 
be studied for several populations Hence, the reader should 
think of the fundamental problems noted as very general In 
defining one of them a particular population should be specified. 

Discovering problems for graduate theses in education. A 
large proportion of first-year graduate students and all candi- 
dates for the doctorate face the task of discovering and selecting 
a suitable thesis problem. The preceding reference to the funda- 
mental problems of education may suggest to the reader that 
the graduate student should direct his attention to them and 
endeavor to formulate a problem whose solution will give 
piomise of contributing to the development of a science of 
education. Unfortunately first-year students and most candi- 
dates for the doctorate do not have the resources and the time 
required for dealing with most of the fundamental problems, 
and hence in general such investigations must be left to experi- 
enced persons who have adequate resources and who may, if 
necessary, continue the research over a period of years Hence, 
graduate students, especially candidates for the master’s 
degree, are restricted to survey investigations or other types 
of inquiries that are commensurate with their training, re- 
sources, and time. With this general restriction in mind, we 
may consider how suitable problems may be discovered. 

Difficulties encountered in the course of practical activities 
are a fruitful source of problems. Hence, it may appear that 
the graduate student who has taught or served in an adminis- 
trative position should approach his thesis work with a num- 
ber of problems m mind It is, however, unusual for a student 
to begin his graduate study with a definite problem in mind. 
Several factors contribute to this condition. Many teachers 
and administrators are not sensitive to difficulties and conse- 
quently do not become aware of many potential sources of 
educational problems Even when a difficulty has been recog- 
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nized, it is frequently not easy to identify the problem and to 
state it in satisfactory terms. Furthermore, practical problems 
are frequently difficult to deal with in a scholarly way. Hence, 
the practical experience of a graduate student is seldom a 
fruitful source of appropriate problems 

It IS helpful to examine a number of graduate theses and other 
reports of research Frequently an author suggests problems 
for study. Other questions may grow out of a critical reading 
of a report of educational research. The possibility of em- 
ploying different techniques for the studying of the same prob- 
lem may occur to the reader Writings about research and re- 
search techniques frequently suggest problems for study. Some 
writers have discussed the need for research m particular fields ^ 
Critically minded instructors frequently suggest problems in 
connection with their courses and the alert graduate student 
will usually be able to accumulate a list of problems from the 
courses he is taking Term reports, especially summaries of 
the research relating to a given topic, usually suggest several 
problems for study 

The alert graduate student should encounter no difficulty in 
discovering a number of problems Whenever a problem occurs 
to him it should be recorded. It is desirable to formulate it 
in concise terms, preferably in question form. A topical state- 
ment is seldom very definitive 

Selectmg a problem for a graduate thesis.^ The reader should 
note that the reference in this paragraph heading is to a “ prob- 
lem rather than to a ^Hopic.^^ The student should think in 
terms of questions or groups of questions rather than topics. 
Until the problems being considered are conceived of in terms 
of questions to be answered, the student cannot wisely evaluate 
the possibilities. 

It IS frequently stated in the requirements of a graduate 

^ See bibliography at end of this chapter. 

2 Although the following reference is directed to another audience, a student 
selecting a problem for a graduate thesis may study it with profit 

Symonds, P M “Common Faults in Graduate Kesearch m Education/’ 
Journal of Educational Research, 27 481-92, March, 1934 
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school, especially those which have been set up for the degree 
of doctor of philosophy m education, that the thesis shall rep- 
resent an “original contribution’^ The term “original contri- 
bution” does not appear to possess any very precise meaning. 
Is a problem that has been studied thereby disqualified as a 
basis for an “original contribution”? The findings of many 
researches in the field of education, especially the earlier ones, 
are not dependable. Many of the problems studied are in need 
of reinvestigation with the utilization of more adequate tech- 
niques It is the opinion of the present writers that very fre- 
quently problems which have been studied are appropriate for 
further research. An “old” problem may be considered suit- 
able for a thesis, provided the student is able to apply improved 
techniques or to apply more skillfully procedures previously 
employed In the case of a fundamental problem, the repetition 
of an investigation for the purpose of verification may be 
justified. 

Some persons do not recognize the synthesis, interpretation, 
and application of the findings of other investigators as “orig- 
inal contributions ” Sometimes this position is qualified by 
saying that such undertakings are acceptable as masters’ theses ^ 
The science of education is m need of evaluation, interpretation, 
and synthesis of the findings of educational research ^ The evi- 
dence scattered throughout the research literature in support of 
hypotheses, rules, and principles needs to be critically exam- 
ined and the dependable findings synthesized so that the justifi- 
cation for accepting these hypotheses, rules, and principles, 
may be established There is also urgent need for the interpreta- 
tion of this scattered evidence in terms of applicability to edu- 
cational practice It appears, therefore, that a critical, scholarly 
summary may be recognized as an “original contribution,” 


1 Several institutions accept for the doctor’s degree theses which represent 
“organizations and applications of existing knowledge ” This modification is 
more characteristic, however, of the institutions which grant the degree of doctor 
of education See Monroe, W S “A Survey of the Requirements for the Doctor 
of Philosophy in Education,” School and Society^ 31* 655-61, May 17, 1930 

2 A more extended treatment of this point is given in Chapter XIII. 
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provided the field of the summary has been sulEficiently studied. 
The question of accepting a cntical summary m partial fulfill- 
ment of the requirements for a graduate degiee involves other 
considerations. To the present writers the acceptance of a 
critical summary does not seem inappropriate in the case ol 
the master's degree ^ 

The contiibutory value of a meie fact-gatheiing investigation 
is usually slight. A similar statement may be made with refer- 
ence to studies whose significance is primarily local In general 
studies of relationships have a higher contributory value than 
survey investigations Furthermore, the possibility of making 
a contribution is greater when the study is based upon or is 
related to previous researches. Educational research has been 
characterized as fragmentary which means that the efforts of 
investigators have not been coordinated and that when a syn- 
thesis of findings is attempted they often are so unrelated that 
dependable generalization is not possible In many cases the 
situation is made still more unsatislactory by reason of the 
fact that a number of the investigations are superficial or have 
dealt with trivialities. 

In addition to the possible contributory value of the research, 
the student should consider the educative opportunity it will 
afford him Pie may properly regard the thesis requirement as a 
major learning exercise. If the woik involved is largely loutme 
and mechanical, the student will learn very little even if he 
succeeds in satisfying the institutional requirement 

The feasibility of the procedures for attacking the problem 
should also be considered. Many important problems are un- 
suitable for graduate theses because the indicated piocedures 
are not feasible or needed instruments and techniques have not 
been developed. A^phase of the feasibility of the indicated 
procedure is the time required to complete a proposed inquiry. 
It IS characteristic of inexperienced persons to underestimate 


1 For discussion of certain aspects of the thesis requirement for the doctorate, 
see Buswell, G T. “Research and the Degree of Doctor of Philosophy in Edu- 
cation, ’’ Journal of Educational ResearcK 23- 146-52, February, 1931 
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the time element and a graduate student should seek the advice 
of an experienced person on this point It is much better to 
study intensively a restricted problem than to spread out one^s 
efforts over a larger problem with the result that the research 
deserves the characterization of superficial 

Finally, the student should consider his training and his 
resources. A problem may be inappropriate for a particular 
student because his training has not provided him with the 
necessary background For example, a student should not at- 
tempt a curriculum problem if he has not taken one or more 
basic courses in this field. Neither should a student attempt a 
problem which will require intricate statistical techniques if he 
has not had adequate statistical training. A student should not 
attempt an experimental problem until he has attained a basic 
understanding of the experimental procedure The student 
should also consider his resources for collecting the needed data 
and for handling them. Some problems require that the col- 
lection of data extend over two, three, or even more years 
Collecting data for other problems requires considerable expense 
for test materials or for other purposes. 

Defining a problem. In order to be effective as a guide m the 
subsequent phases of the research, a problem must be defined 
so that there will be a concise and complete statement of the 
specific question or questions to be answered. For example, 
consider the problem, “What is the relation between achieve- 
ment in algebia and in Latin?” As stated this problem is very 
general It should be restricted. One desirable restriction is 
with reference to the period during which the two subjects are 
studied Another relates to whether the two subjects are studied 
simultaneously or in sequence. When these restrictions are in- 
corporated, we have a statement of the following t3q)e: “What 
is the relation between achievement in first-year algebra and 
achievement m first-year Latin when the subjects are studied 
simultaneously in the ninth grade Further definition is 
needed with reference to the student group to be considered 
As stated, the question applies to all students in all schools. 
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One basis of restriction in this respect is the requirement or 
non-requiiement of the subjects being considered Finally a 
definition of achievement is needed When all of these items 
are incorporated the statement would appear as follows: 

What IS the relation between achievement (as defined) in first- 
year algebra and achievement (as defined) in first-year Latin when 
these subjects are studied simultaneously m the ninth grade m high 
schools which require algebra but make the study of a foreign lan- 
guage elective and offer one modern language in addition to Latin. 
Students repeating one or both of these subjects are to be excluded 

The problem may be further limited to public high schools and 
even to those of a certain size and within a certain area 
The definition of a problem should result m a precise and 
complete statement of the questions to be answered Attention 
must be given to the terms used In the illustration just given 
attention was called to the necessity of defining “ achievement/' 
Grades received" might be used instead of achievement, but 
to do so would introduce a significant change m the problem. 
Similarly, there is a difference between “costs" and “expendi- 
tures"; “test scores," and “measures of achievement"; “in- 
telligence as measured," and “intelligence " Frequently, the 
problem analyzes into a group of questions Each of these 
should be stated or at least definitely indicated.^ 

The function of the definition of a problem. Although a 
problem may be discovered when examining an accumulation of 
facts, the logical sequence is the problem first and then the 
collection of data to fit the problem. In this sequence the 
definition of the problem serves as a guide in determining what 
data should be secured and in collecting them. This function 
of the definition of a problem is more important in some cases 
than in others In historical research, case studies, and a few 
other types of problems, a precisely defined problem may not 
be necessary because the purpose is to discover the implications 
of the data that it is possible to collect In general, however, 
the researcher should endeavor to define his problem before 

1 For a discussion of defining experimental problems see Chapter IX. 
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the collection of data is begun and it is wise to reduce this 
definition to writing Without a clearly defined problem, an 
investigator may waste time m collecting data. He may col- 
lect unnecessary data. He may fail to collect some essential 
data. It may happen that the data collected are not those 
called for by the problem 

The definition of the problem enables one to anticipate the 
statistical treatment of his data Occasionally a graduate stu- 
dent or other investigator collects data and then finds that he 
does not have command of the statistical techniques required 
for dealing with them A few years ago a graduate student 
who had only incidental training in statistical methods came 
to the senior author to ask how he should handle the data he 
collected The problem, which had not been adequately defined, 
required the use of partial correlation The student found the 
mastery pf this technique too much of an undertaking and se- 
lected another problem for his thesis The labor he had ex- 
pended in collecting his data was wasted. Even persons of ex- 
perience in educational research sometimes do not realize the 
diflSculties in store for them until the problem is adequately 
defined In reporting his study of the influence of nurture upon 
intelligence T L Kelley ^ confesses in the preface that although 
he had discussed the problem for years and had believed it to 
be a relatively simple one, he found that he had never clearly 
defined either ^'nurture'’ or '^nature.” He also found that the 
necessary statistical techniques had not been devised 
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CHAPTER III 


COLLECTING THE DATA SPECIFIED 
BY A PROBLEM— BASIC TECHNIQUES 

Educational research not a mechanized routine. Although 
relatively definite techniques will be described in this and 
several of the following chapters, the reader should not infer 
that educational research can be reduced to a mechanized 
routine A problem, whose study is worthy of designation as 
educational research, represents a '^new^^ situation. In meeting 
this situation the research worker must define the problem and 
then determine what technique, or techniques, arc appropriate 
for collecting the necessary data. In making this decision he 
must consider not only the nature of the problem but also the 
conditions which he will likely encounter in collecting his data 
For example, he may have to choose between two techniques 
on the basis of their relative feasibility with respect to the 
conditions current at the time of the investigation After a 
technique has been determined upon, it is frequently necessary 
to make adaptations to the problem and to the conditions under 
which the research is conducted Sometimes considerable 
ingenuity is required to obtain the needed data. The handling 
of the data collected may be a relatively routine activity but 
the interpretation of the derived statistics is likely to require 
critical reflective thinking of a high order Hence, the reader 
of this text should, at all times, bear in mind that educational 
research cannot be conducted in a routine fashion. 

Types of data. When reference is made to the data of educa- 
tional research, one is likely to think of test scores, school 
marks, chronological ages, teachers^ salaries, counts of things 
such as pupils or language errors, and the like The term, 
however, is used with a broader meaning In dealing with some 
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educational problems it is necessary to secure statements of 
beliefs or opinions, statements descriptive of schools or events, 
conclusions reached by previous investigators, or historical 
information Hence, data should be thought of as including all 
facts, concepts, and principles useful in deriving answers to the 
questions listed in the definition of a problem. 

Data that represent measures or counts of things are expressed 
in numerical terms and are commonly referred to as quantita- 
tive. Statements of beliefs or opinions, descriptions of schools 
or events, and the hke are referred to as non-quantitative data 

Objective data versus subjective data. The data of educa- 
tional research are often regarded as subjective when they 
consist of expressions of judgment made by subjects participat- 
ing in the mvestigation as sources of data. From this point of 
view, responses to many questionnaires would be labeled 
subjective data ” It is more desirable to regard as subjective, 
data obtained in such a manner that they may be influenced 
by the person collecting them. For example, observations of 
the behavior of small children made by a research worker 
would be subjective because, in collecting the data, he gives 
meaning to his sensations in perceiving overt acts. To the extent 
that another research worker making the same observations 
would be likely to give different meanings to the sense data,’' 
we are justified in regarding the recorded observations as 
subjective. 

The converse of this statement gives the definition of objec- 
tivity which will be used in this book When data are such 
that there has been very little or no opportunity for them to 
be affected by the prejudices, opinions, and judgment of the 
person collecting them, they are to be regarded as objective 

It should be noted that the subjectivity of data is a matter 
of degree Certain types of data are regarded as “highly sub- 
jective” because experience has shown that in general two 
persons working independently will obtain significantly different 
data. This is true m the case of marks assigned to examination 
papers of the essay type Other types of data are known to be 
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only slightly influenced by the person who collects them and 
may be described as '^slightly* subjective ” Sometimes such 
data are designated as “objective” but it is wise to reserve 
this term for data whose degree of subjectivity approaches 
zero In other words, “subjective” has a somewhat broad and 
relative meaning while “objective” has a more restricted 
meaning. 

The problem a guide in collecting data. The problem, when 
adequately defined, specifies the data to be collected, and 
usually the items collected should be limited to the requirements 
of the problem Collecting unnecessary data consumes time 
and may distract the attention of the investigator from the 
issues of his problem Occasionally it may appear desirable to 
redefine the problem so as to extend its scope, or the investigator 
may find that it is feasible to collect data for one or more related 
problems without adding greatly to the time required for this 
phase of the work It is unwise, however, to collect data merely 
because they arc thought to be interesting. Data which are not 
used represent wasted effort 

The basic techniques for collecting data. Although the 
details of the procedure followed in a given problem are deter- 
mined by its nature and the availability of sources and instru- 
ments, there appear to be only a few basic techniques. In the 
following exposition, seven arc recognized.^ 

1. Copying data from records and published sources. 

2. Analyzing texts, records of activities, etc 

3. Interviewing. 

4 Constructing and using questionnaires. 

5. Securing a current record of activities, etc. 

6. Making estimates and ratings. 

7. Selecting and administering tests. 

1. Copying data from records and published sources. When 
data are to be copied from records or published sources, the 
principal considerations are accuracy and economy of time. 

^Locating published sources of mformation and collecting data of the sort 
that are utilized in dealing with philosophical problems or in preparing a sum- 
mary of research are dealt with in Chapters XII and XIII 
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Accuracy in copying is attained when the entries made are 
identical with those in the source. The number of errors can 
be reduced to a minimum by employing certain techniques and 
devices, but checking is necessary to insure accuracy. 

The details of the process of cop3nng and the form in which 
the copied items should be arranged vary with the nature of 
the source and the use that is to be made of the data. A re- 
sourceful person will usually think of devices that will facilitate 
the work as well as reduce the errors of copying to a mimmum 
When data are being copied from several columns of a table, a 
strip of cardboard may be used to assist in following a line across 
the table Alexander^ has suggested a helpful device when 
data are to be copied from only certain columns of a table 
It consists of a strip of paper, or cardboard, one edge of which 
is notched so that when it is placed horizontally on the table, 
the items to be copied are exposed and the irrelevant items are 
masked. 

In copying numerical data from records and published sources 
it is important to label accurately each item or each group of 
similar items. What this means may be illustrated by reference 
to numerical data recorded in a publication of the United States 
Office of Education.^ For example, Table 10 of this publication 
gives data pertaining to students m state normal schools ® Any 
item copied from this table should be accompanied by the words, 
or phrases, which give meaning to the datum For example, 
one figure given in the table may be copied “2805 women resi- 
dent state normal school students enrolled in regular session 
in New Jersey, 1927-28.^^ The fact that this figure includes 
both white and colored women students should probably be 
recorded also. The figure 7291 when copied should be given 
the label: “resident men students in all courses, excluding 

^ Alexander, Carter School Statistics and Publicity Boston Silver, Bnrdett 
and Company, 1919, p 85 

2 Phillips, F M. “Statistics of Teachers Colleges and Normal Schools, 1927- 

28,” United States Bureau of Education Bulletin^ 1929 No 14 Washington: 
Government Printing Office, 1929 71 pp. 

3 Ibid , p 19 
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duplicates, enrolled m state normal schools of continental 
United States, 1927-28; includnxg both white and colored men 
students ” If the precise meaning of the label is not apparent, 
it should be ascertained and recorded. Such labels as '^per pupil 
in average daily attendance, ^‘per cent,” and even ‘^enroll- 
ment ” may vary in meaning 

Economy of time m copying is attained by devising an 
appropriate plan of work and by employing special devices 
such as the one described by Alexander. Contributions are 
made to economy m the total time of the research by copying 
the data in a form that will facilitate their tabulation and by 
making certain that all of the needed data are copied when the 
source is at hand ^ Frequently much time can be saved by 
anticipating the tabulations to be made from the copied data 
Items copied from several sources which pertain to the same 
topic can be copied on the same card, or page Items can be 
grouped so that calculation of measures of central tendency, 
such as means, is greatly facilitated Occasionally, it may be 
desirable to plan the tables which arc to appear in the reported 
research and to organize the copying of numerical data so that 
blanks in the prepared tables may easily be filled in. Careful 
planning reduces the probability of omitting needed items in 
copying Furthermore, careful planning will tend to restrict 
the tendency to copy interesting but irrelevant data. 

Although it is not a part of the process of copying, the inves- 
tigator should inquire concerning the accuracy of the sources. 
Sometimes one source may be checked against another. An 
illustration of this is found m the work of Rugg in the St. Louis 
Survey. In this investigation Rugg checked the statistics of the 
United States Bureau of Education against the statistics of the 
United States Bureau of Census and showed, for example, that 
“the per pupil cost for salaries and expenses of supervisors, 
principals, salaries of teachers, repairs, textbooks, salaries of 
janitors, obtained from the two sources . . . show a very 

^ Failure to record an adequate bibliographical reference i8 not an uncommon 
fault of inexperienced investigators 
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satisfactory agreement. . The investigator who utilizes 

numerical data from records and published sources should regard 
his data in somewhat the same manner as the historical re- 
search worker. He should seek to establish the validity and 
accuracy of his data by attempting to secure “aflirmations of 
independent witnesses ” 

Finally, when copying data from records and published 
sources, it is important to make a complete record of the exact 
source from which the data are being taken Such a record 
serves three purposes First, the source may be more easily 
located again if further data are needed. Second, the source 
may be more easily located for checking the accuracy of the data 
copied. Third, it is very desirable to present in the report of 
the investigation adequate and accurate citations of the 
sources of the numerical data. It is not scholarly to do other- 
wise 

2 Analyzing textbooksy courses of study y records of acUvitieSy 
and the like. In analyzing textbooks, the investigator may 
wish to determine what topics are treated, what words are used, 
what types of learning exercises are given, and so on, or his pur- 
pose may be that of determmmg the amount of space given to 
topics, the frequency and order of appearance of words, the fre- 
quency of different types of learning exercises, and the like In 
analyzing courses of study he may wish to determine what 
courses are offered and the characteristics of these courses 
Pupil writings may be analyzed with respect to language errors. 

The identification of words, language errors, topics, and other 
characteristics requires the recogmtion of certain criteria The 
problem, when adequately defined, specifies or at least implies 
the criteria to be observed For example, if the problem calls 
for the determination of the arithmetical processes needed in 
solving problems given in commonly used chemistry texts, 
the meamng of the term “arithmetical processes” furnishes a 

lEugg, H 0 “Public School Costs m St Louis,” Survey of the St Louis 
Public Schools, Part III — Finances Yonkers-on-Hudson, New York World 
Book Company, 1918, p 21 
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criterion More precise definition may specify the identification 
of particular number combinations 

The analysis usually requires careful reading of the texts, 
courses of study, or other materials during which the items 
that satisfy one or more of the criteria arc identified and copied 
on data sheets or cards. Occasionally the reading may be com- 
pleted before the work of recording is begun In such cases the 
investigator may check with suitable symbols the items that 
are to be copied later This is an excellent procedure when a 
trained analyst is needed in the identification of the items, but 
the work of copying may be left to clerks ^ 

The criteria to be observed may be simple. For example, if 
pupil compositions are being analyzed for misspelled words, the 
spellings given by a dictionary furnish the criterion ^ When the 
problem is to determine the 'Vocabulary load^^ of textbooks, 
the Thorndike Word Book ^ is often used as the criterion This 
list contains Thoindike^s determination of the ten thousand 
most commonly used woids in the English language and when 
used as a criterion the analysis becomes the identification of 
the words not found m this list. The analysis may be made 

1 The following references may be consulted for further discussions of the 
techniques used m analyzing textbooks 

Fowlkes, J G Evaluation of School Texthooki> Now York Silver Burdett and 
Company, 1923. 33 pp 

Franzen, R H , and Knight, F B Textbook Selection Baltimore: Warwick 
and York, 1922. 94 pp 

Hall-Quest, A L The Textbook* How to Use and Judge It Now York The 
Macmillan Company, 1918 265 pp. 

Maxwell, C R The Selection of Textbooks Boston Houghton MifUin Com- 
pany, 1921. 138 pp. 

Spaulding, F. E Measuring Textbooks. New York Newson and Company, 
1922 40 pp 

2 There will be need for a few supplementary rules to furnish a basis for dis- 
tinguishing between misspellings and errors of diction or grammatical errors 

3 Thorndike, E L The Teacher's Word Book New York Bureau of Publica- 
tions, Teachers College, Columbia University, 1921 134 pp 

The use of this criterion is illustrated in the following references 

Lively, B A , and Pressey, S L. “A Method for Measuring the Vocabulary 
Burden of Textbooks,” Educational Administration and Supervision, 9,389-98, 
October, 1923 

Remmers, H H , and Grant, A “The Vocabulary Load of Certain Secondary 
School Mathematics Textbooks,” Journal of Educational Research, 18 203-10, 
October, 1928. 
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more detailed by identifying the words in the first three thou- 
sand, those m the next two thousand, and so on. Thorndike's 
list is not entirely satisfactory as a criterion because no atten- 
tion is given to the particular meaning with which a word is 
used ^ Precise identification of technical terms in a particular 
field would require a more complex criterion 

The more common analyses of records of activities ^ are those 
of pupil writings for language errors or misspelled words and 
of arithmetical work for calculation errors. Charters ® has re- 
ported an attempt to ascertain the arithmetical operations used 
by salespeople. He selected at random 7337 charge checks 
(records of purchase transactions in which the goods are charged 
to customer accounts) and analyzed them for the addition and 
multiplication combinations involved Another illustration is 
the analysis made of the Reader^ s Guide to Periodical Literature 
for the three-year period of 1919-21 ^ A card was made for 
each of the eleven thousand topics appearing in the Index The 
number of articles relating to each topic was noted The cards 
were then sorted into piles, “one pile for each general field of hu- 
man action or interest that seemed to be indicated or called for 
by the cards themselves " The sorting was carried on m this way 
until an apparently satisfactory grouping was attained This 
work of classification resulted in a list of 46 topics with a range 
of from 9920 articles on the topic of government down to 89 for 
mathematics The research of Finley and Caldwell ^ illustrates 
another type of analysis. They sought to determine the extent 
to which biological material appears m the public press 

1 For a discussion of the importance of meanings m vocabulary analysis see 
Dolch, E W Reading and Word Meanings Boston Ginn and Company, 1927 
129 pp 

2 Becords of activities include not only the types referred here, but also those 
secured by motion picture cameras, sound recording instruments, and other 
apparatus, and by observational procedures See page 47. 

3 Charters, W W Curriculum Construction New York: The Macmillan 
Company, 1923, pp 231-36 

^Bobbitt, Franklin, et al “Curriculum Investigations,” Supplementary 
Educational Monographs, No 31 Chicago Umversity of Chicago, i926, pp 7- 
22 

s Finley, C W , and Caldwell, O W Biology in the Public Press New York 
The Lincoln School of Teachers College, Columbia University, 1923 151 pp 
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Sometimes the formulation of criteria creates an important 
subordinate problem because the investigator is exploring a new 
field For example, in a study of aiithmetic texts by Monroe 
and Clark ^ it was necessary to determine the elemental types 
of problems before the analysis could be begun and a list of 
333 problem types was evolved only after several weeks of 
labor In this case the task was unusually difficult because little 
attention had been given to the determination of problem 
types. 

Many of the simpler types of analysis tend to be objective 
and a careful worker will secure highly accurate data, but not 
infrequently the process of analysis is rather highly subjective. 
In such cases steps should be taken to reduce the effect of sub- 
jectivity to a minimum. The first few hours devoted to the 
analysis should be considered a period of training and the 
work should be done over, preferably toward the close of the 
total period of analysis A memorandum should be kept of all 
decisions made, so that consistency may be attained. All of 
the materials should be analyzed by the same person rather 
than be divided among two or more woikeis. If the materials 
can be analyzed independently by two or more persons, their 
results may be averaged. 

3. Interviewing} The interview and the questionnaire are 
used for securing unrecorded data in the possession of other 
persons. Both of these techniques involve subjective elements, 

1 Monroe, W S , and Clark, J A. “ The Teacher’s Responsibility for Devismg 
Learning Exercises m Arithmetic,” Univers%ty of Illinois Bulletin^ Vol 23, 
No 41, Bureau of Bducational Research Bulletin, No, 31. Urbana University 
of Illinois, 1926 92 pp 

2 The interested reader will find a more extended consideration in the following 
references 

Bixler, H H Chech Lists for Educational Research New York* Bureau of 
Publications, Teachers College, Columbia University, 1928, pp 38-40 

Bogardus, E S “The Social Research Interview,” Journal of A']i)pl%cd So- 
ciology, 10* 69-82, September-October, 1925 

Bogardus, E S The JSfev) Social Research, Los Angeles Jesse Ray Miller, 
1926, pp 69-130. 

Sturtevant, S M , and Hayes, H “The Use of the Interview m Advisory 
Work,” Teachers College Record, 28 551-62, February, 1927 

Waples, Douglas, and Tyler, R W Research Methods and Teachers^ Problems 
New York The Macmillan Company, 1930, pp 519-32. 
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but when appropriate precautions are taken, the data obtained 
will usually be at least reasonably satisfactory 

In order to be successful the interviewer must be prepared 
for his task An important phase of this preparation is deter- 
mimng the questions to be asked. An essential characteristic 
of the questions formulated is their relevance to the problem. 
The words and phrases used in asking the questions should be 
such as are likely to be understood by the individuals inter- 
viewed It is desirable to avoid leading” questions, since 
they tend to bias the responses. A question beginning “Do 
you — ” will secure information of one type while a question 
commencing ^^How do you — ” will secure a different type of 
information Questions beginning ^^Do you — ” and ^^How do 
you — ” are instrumental in securing factual information, while 
questions beginning “Do you favor — ” “How do you feel 
about — ,” “Do you recommend — ” function in the collection 
of opinions, judgments, and the like A question beginning 
“Why do you — ” may secure either facts, or opinions, or both. 

General questions are not likely to be very effective. For 
example, consider a t3q)ical group of elementary school teachers 
being asked, “What difficulties do you encounter in teaching 
silent reading?” Many of those interviewed will be able to 
recall only a few difficulties and probably none of them will 
mention all of their difficulties. More satisfactory results will 
be obtained if the interviewer asks specific questions: “Are you 
able to interest all pupils in silent reading?” “Are you able to 
determine the reading difficulties of pupils?” “Are you able 
to stimulate reading for enjoyment?” 

The persons to be interviewed should be representative of 
the population or conditions for which conclusions are desired. 
Some steps should be taken to enlist the cooperation of these 
individuals. It is helpful in this connection to secure the spon- 
sorship of some institution or association which those inter- 
viewed are likely to respect. If one wishes to interview the 
teachers m a given school system, the approach should be 
through their superintendent. The mvestigator should arrange 
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appointments that will be convenient to the persons inter- 
viewed, and which are for a sufficient duration of time It is 
desirable to stimulate an appropriate frame of mind on the 
part of the person to be interviewed. This may often be done 
through an appeal to pride, some reference being made to the 
competence of the person to give the information desired It is 
frequently effective to point out that the person interviewed 
is being given an opportunity to aid in the study of an im- 
portant problem and that the information requested may not 
be obtained in any other way The interest of the person inter- 
viewed can be stimulated by the interest exhibited by the re- 
search worker in his problem. 

An interview should not degenerate into a quiz in which the 
person being questioned is caused embarrassment because he is 
unable to give the information desired Personal or confidential 
matters should not be inquned into unless the interviewee indi- 
cates a willingness to be questioned about th('m. Whenever 
publication of the solicited information might cause (anbarrass- 
ment to the interviewee, assurance should be given that he will 
not be quoted and that information given m confidence will not 
be revealed 

‘ It may not be wise to have a list of questions in evidence 
during the interview It will sometimes be desirable for the 
investigator to memorize his list of questions. Usually, how- 
ever, m collecting facts or opinions from mature individuals 
the presence of a question list should not result unfavorably; 
rather, it should aid in maintaining the control of the interview 
needed if all questions are to be answered When it is evident 
that a question is not fully understood, the investigator should 
explain the terms used, or restate the question. If t\m investi- 
gator does not fully understand the response he should ask 
further questions The investigator should be careful at all 
times to keep to the subject. 

It is desirable to record in detail the responses of the sub- 
jects interviewed, but sometimes it may not be wise to do this 
in the presence of the subject. The time required for making 
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the record may interfere with the interview, or it may be that 
the subject will be intimidated or that a hostile attitude will be 
engendered When it appears that either of these conditions 
will prevail, the interview should be written up as soon as pos- 
sible after it has been made The writers are of the opinion 
that most adults are not likely to be embarrassed by the open 
recording of their responses to questions. A superintendent, 
principal, or teacher is likely to respect the interviewer for his 
evident system It is probable that the presence of a list of 
questions and the open recording of answers is most likely to 
have unfavorable effects when children are being interviewed 

The use of the interview is illustrated by the studies of Eufi^ 
and Sharman ^ In addition to securing data by the administra- 
tion of tests, examination of records, and observation, Rufi col- 
lected many facts by interviewing principals and teachers in 
five small high schools It is possible that some of these data 
could have been collected by questionnaire rather than by 
interview, but it is likely that this procedure would not have 
been as satisfactory Some of the facts obtained by interview 
appear to be of the type which individuals would be hesitant 
to record on a questionnaire, but would be less hesitant to re- 
veal in the course of a conversation It is probable that in 
many cases the principals and teachers elaborated their re- 
plies so that they were clearly understood, which is not always 
true in the case of responses to a questionnaire. It is also prob- 
able that much information was volunteered relative to condi- 
tions, which the investigator would not have had the fore- 
thought to ask for if he had restricted his technique to the use 
of a questionnaiie 

Data were obtained in the investigation of Sharman by 
interviewing principals, athletic coaches, physical education 

1 Rufi, John “The Small High School,” Teachers College, Columbia Umversitu, 
Contributions to Education, No 236 New York Bureau of Publications, Teach- 
ers College, Columbia University, 1926 145 pp 

^ Sharman, J E “Physical Educational Facilities for the Public Accredited 
High Schools of Alabama,” Teachers College, Columbia University, Contributions 
to Education, No 408 New York Bureau of Publications, Teachers College, 
Columbia University, 1930 78 pp 
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teachers, and other members of the faculties of a random 
sample of 38 per cent of the 275 public accredited high schools 
of Alabama The investigator spent from one and one-half to 
two and one-half hours in each of the schools Data weie ob- 
tained relative to the distnbution of full-time and pait-tmie 
physical education teachers, physical examiners, rc'quirements 
relative to health education, credit for physical education, 
types of activities in physical education programs, and facilities 
for physical education. 

4 Constructing and using questionnaires. The questionnaire 
has long been subjected to criticism as a means of collecting 
data The following comment was made in 1839 

It IS impossible to expect accuracy m returns obtained by circu- 
lars, various constructions being put upon the same question by 
different individuals who consequently classify their replies upon 
various principles ^ 

Thorndike made the following statement in 1911. 

One vice of statistical studies m education today is the indiscrim- 
inate use of lists of questions as a means of collecting da4a l)y cone- 
spondcnce ^ 

Since 1911, numerous writers have criticiised the (piestionnaiie 
as a means of collecting dat.a.'^ hixamination of the various 
criticisms reveals three general claims. 

1 “Keport of a Committoo of tho Mancliosier StatiHtical Society on the Siat.o 
of Education m the County of Rutland m tho Year 183Hy Journal of the KStaths-- 
heal Soaetij of London, 2* 303, October, 1839 

2 Thorndike, E L. Quantitative Investigations in Education,” ^School 
Review Monograph, Vol 1, 1911, p 43 

3 The following are some of the more extreme criticisms, 

Burk, Frederic “On a Certain Questionnaire,” School mid Sondg, 15* 170- 
73, February 11, 1922. 

Butterfield, E W “Professional Ethics and tho Quosi.ionmurc,” School and 
Society, 11 55-56, January 10, 1920 

Butterfield, E. W, “Educational Surveys,” Educational Heview, 08. 1-5, 
June, 1924 

Butterfield, E W “The Plenary Inspiration of tho Dotted Lino,” Educational 
Review, 69; 1-4, January, 1925 

Flaccus, Quintus H (Pseudonym). “Research m tho Payment of School 
Executives,” School and Society, 32 806-07, December 13, 1930 
Hankinson, Frank “The Blight of the Questionnaire,” Educational Review, 
73 102-08, February, 1927. (Continued next page) 
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1. The questionnaire is not an effective means of collectmg data. 
The reasons given include (a) construction of questionnaire 
unsatisfactory, (b) responses carelessly made or mere opinions, 
(c) per cent of returns too low or the returns are not from a repre- 
sentative sample of the population 

2. The problem studied is unimportant, or the investigation is so 
limited in scope that the findings have little significance 

3. The questionnaire is an unjustifiable nuisance to superintendents, 
high school principals, and other recipients. Too many ques- 
tionnaires are mailed out, many of them are unreasonably long; 
in many cases some of the information asked for is too personal 
or could be obtained from pubhshed sources, in some cases the 
recipient is asked to make time-consuming calculations. 

When these claims are examined critically it is apparent that 
competent and courteous use of the questionnaire in the study 
of worth while problems is not condemned. The questionnaire 
is an important labor-saving device and it is obvious that its 
use as an instrument for collecting data cannot be completely 
dispensed with. The United States Office of Education, state 
departments of education, and other administrative units must 
use it as a means of collecting information that is generally 
recognized as valuable.^ The cost of collecting this information 
by actual visitation and interview would be prohibitive It 
would also be prohibitive in many survey investigations which 
cover a large area. 

Ruckmick, C. A “The Uses and Abuses of the Questionnaire Procedure/* 
Journal of Apphed Psychology, 14 32-41, February, 1930 

Whitney, F P “Questionnaire Craze,” Educational Review, 68 139-40, 
October, 1924 

1 For discussions from this point of view see 

Bawden, W T “ Searching after the Truth,” Industrial Education Magazine, 
27 172-73, December, 1925 

Douglass, H R “The Questionnaire — To Be or Not to Be,” School and So- 
ciety, 15 397-99, April 8, 1922 

Fallon, J F “The Questionnaire in Educational Research,” Catholic Educa- 
tional Review, 23 539-45, November, 1926. 

Koos, L V The Questionnaire New York The Macmillan Company, 1928. 
178 pp 

Norton, J R. “The Questionnaire,** Research Bulletin of the National Educa- 
tion Association, Vol 8, No 1 Washington National Education Association, 
1930 51 pp A scale for rating questionnaires is suggested. 

Perry, H E “The Questionnaire Method,** Journal of Applied Soaology, 
10 155-58, November-December, 1925 
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Although it is apparent that unqualified condemnation of 
the questionnaire is not justified, the present wnteis are fully 
aware that, even in the hands of a competent investigator, this 
instrument may 3deld faulty data Wylie ^ has reported a 
personal first-hand investigation of the data secured by a ques- 
tionnaire administered to eleven- and twolve-year-old pupils 
Although the questions called lor simple factual information, 
many answers involved a significant error. Sell-mterest of the 
respondents is sometimes a cause of a systematic error ^ Math- 
ews has shown that the order of the printed response words to 
be underlined on an interest questionnaire may introduce a 
systematic error ® Frequently it is not possible to give precise 
and accurate answers to questions calling for factual state- 
ments descriptive of practices or conditions without adding 
much explanation It is well known that failure of many of the 
recipients to lespond to a questionnaire is a frequent cause of 
systematic eriors in questionnaire data. Possibly tlu^ most 
adverse ciiticism that may be made of this technniue, as it is 
most usually employed, is that the degrc'c of reliability, validity, 
and representativeness of the data is uncertain. 

The use of the questionnaire vshould be limit('d to worth while 
inquiries for which the needed data cannot ho conveniently 
obtained by other means ^ Even in sucdi easels the study should 
not be attempted unless the resources at the c-ommand of the 
investigator are such that success may reasonably b(^ antici- 
pated. These statements may seem platitudinous, but a sur- 

1 Wyhe, A, T “To What Extent May We Roly tipon the Answers to a School 
Questionnaire'^’ The Journal of Educational Method, 0, 252-57, l<\>bruary, 
1927 

2 For a further discussion of this, see Stoke, S M , and Lehman, H C “The 
Influence of Self-Interest upon Questionnaire Replies,’' School and Society, 
32 435-38, September, 27, 1930 

® Mathews, G O “The Effect of the Order of Printed Response Words on an 
Interest Questionnaiio,” Journal of Educational Psychology, 20 128-34, Febru- 
ary, 1929 

4 Sometimes the desired data may bo secured from published reports, and m 
other cases a modification of the problem will change the demands for data so 
that they may be secured from such sources Institutional records and the files 
of state departments of education and of accrediting agencies are x^aluable 
sources of data that should not be overlooked 
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prising number of questionnaire investigations deserve to be 
classified as of minor, if not trivial importance. 

A questionnaire will be most successful when it is limited to 
requests for simple factual information in the possession of the 
recipients or easily accessible to them. A questionnaire that 
requires the recipient to spend much time in collecting or com- 
puting the requested information is not likely to be very suc- 
cessful and should be used only when the problem is one of 
considerable importance. A questionnaire asking for expres- 
sions of opinion should be employed only when the problem is 
important and the correspondents, in addition to being persons 
whose opinions will be significant, may be expected to be suffi- 
ciently interested so that they will give the questions thoughtful 
consideration 

The questions should be stated so clearly that they cannot 
be misinterpreted Technical words and other unusual terms 
should be explained. The questions should be formulated so 
that they will not suggest or bias the answers They should 
call for responses involving as httle writing as possible. Ques- 
tions calling for numerical data or for responses of ^'yes^^ or 
^^no,^’ underlining, or checking are most desirable. In some 
cases, questions should be inserted which will serve as checks to 
other questions. When this precaution is observed, the investi- 
gator IS able to make an estimate of the reliability of his ques- 
tionnaire data on the basis of “internal evidence 

The respondent should not be asked to do work which the 
investigator can do himself, such as computing totals, aver- 
ages, or per cents The questionnaire should be just long 
enough to secure the necessary data Data irrelevant to the 
problem should never be requested The investigator, how- 
ever, should make certain that he has anticipated all of his 
needs If the data required for the problem are not thus antici- 
pated, the investigator may find it necessary to do without 
important information, or to impose a second questionnaire 
on his respondents 

In preparing a questionnaire considerable attention should 
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be given to its appearance. An attractive looking blank is more 
likely to secure a good response than an unattractive one. If 
possible, the questionnaire should be printed When it is more 
than two or three typewritten pages in length and several hun- 
dred copies are required, printing is usually less expensive than 
mimeographing. The questionnaire should have a heading 
which includes an institutional or associational designation, and 
possibly a title Spaces should be provided for the name, 
address, and position of the respondent. The questions should 
be spaced with care, ample room being allowed for responses 
Respondents are less likely to overlook items when they are 
spaced effectively The size of the questionnaire should be 
one that is convenient for handling and filing. Letter size 
(8}i by 11) IS recommended. Legal size sheets are frequently 
inconvenient 

The first draft of the questionnaire should be submitted to 
competent and disinterested poisons for (iritudsm If possible 
it should be given a preliminary trial by submitting it to per- 
sons typical of the proposed mailing list. Criti(‘ism and trial 
often reveal inadequacies not apparent to tlui aiii-hor of the 
questionnaire For example, it may be found that certain ques- 
tions are easily misinterpreted Analysis of these questions 
may disclose that the terms used arc more technical than neces- 
sary or are in need of definition. Attempts to tabulate the data 
secured in a trial may suggest changes which will make for 
greater ease in tabulating 

The questionnaire should be accompanied by a tactful letter 
of explanation. The recipient should bo informed of the nature 
of the problem and of its importance. It should be indicated 
that the responses will be held in confidence if their nature 
makes this action desirable In this connection it should be 
mentioned that the questions should not be unduly personal in 
nature. Before sending out an elaborate questionnaire the 
willingness of the prospective recipients to respond should be 
determined by a preliminary inquiry When this precaution is 
taken, greater interest is likely to be stimulated Furthermore, 
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some of the expense of sending elaborate questionnaires to indi- 
viduals disinclined to respond, will be avoided. The recipient 
should be told that a summary of the data collected will be 
mailed to him It is an excellent procedure in cases where the 
summary refers specifically to institutions or school systems, to 
accompany the summary with requests for criticism by its re- 
cipients. When these criticisms have been obtained, they may 
be used to improve the report of the study 

The questionnaire should be mailed at an opportune time. 
Seasons of vacation or periods of excessive activity should be 
avoided. For example, it is unwise to mail out a questionnaire 
just prior to the Christmas holidays or at the beginning or close 
of a semester The questionnaire should be accompanied by a 
self-addressed stamped envelope for its return. A follow-up 
letter may be sent, under certain conditions, to those who fail 
to respond. Frequently, a tactful follow-up letter will greatly 
increase the per cent of responses. It is desirable to keep a 
record of the responses. In some cases it may be worth while to 
graph the number of questionnaires returned daily and to mail 
the follow-up letters when the curve falls unduly. Toops has 
reported two studies which reveal that the per cent of response 
may be increased to an acceptable figure by means of follow-up 
letters ^ A similar conclusion has been reported by Lindsay.^ 
In connection with emphasizing what an investigator who 
employs a questionnaire should do, it may be pointed out that a 
recipient should recognize his responsibility Many problems 
cannot be studied except by employing a questionnaire and 
when the problem is a worthy one and the investigator has 
exhibited competence and good judgment in prepanng the 
questionnaire, the recipient should make an effort to respond. 
It is not courteous to throw it in the waste-basket If there are 

^ Toops, H A “ Returns from Follow-Up Letters to Questionnaires,” Journal 
of Applied Psychology, 10 92-101, March, 1926 

Toops, H A “Validating the Questionnaire Method,” Journal of Personnel 
Research, 2 153-69, August-September, 1923 

2 Lindsay, E E “Questionnaires and Follow-Up Letters,” Pedagogical 
Seminary, 28 303-07, September, 1921 
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valid leasons why the information should not be given, the 
questionnaire should be returned with a note of explanation. 
In answering the questions asked, an effort should be made to 
give correct information. If a question is not understood or if 
the recipient is unable to give an answer that he believes to be 
correct, the question should be omitted. When theie is some 
doubt in regard to the meaning of the question asked, a nota- 
tion indicating the interpretation given it may be made. 

In concluding this discussion of the details of the question- 
naire procedure, it may be reiterated that the problem should 
be worth while The use of a questionnaire is not justified if 
it is feasible to collect the data by other means. The question- 
naire should be carefully prepared and mailed to a wisely selected 
list of persons who are in a position to provide the desired in- 
formation The graduate student about to collect data by means 
of a questionnaire should secure the sponsorship of his institu- 
tion In many cases the vsponsorship of an accrediting agency, 
or educational association, will secure an ex(‘, client response. 
Frequently, the letter accompanying the questionnaire should 
be signed by the sponsor rather than by the stiuhuit Many 
persons arc unwilling to lespond to a questionnaire from an 
unknown person, but arc glad to lespond to one authorized 
by an institution or organization that they respect. It should be 
mentioned in this connection that institutions and organizations 
should not violate this trust. The sponsor should accejit the re- 
sponsibility for making certain that tlu^ problem is worth while, 
a questionnaire necessary, and that the blank is well constructed. 
Furthermore, the sponsor should see that iho respondents re- 
ceive a summary of the results without undiu^ lapse of time 

5 Securing a current record of activity.'^ When an a(divity 
does not normally result m an observable ro(‘ord, the va- 
rious devices and techniques employed to secure a current rec- 
ord may be classified under three heads. (1) mechanical, 
(2) stenographic, and (3) observational The moving picture 

1 The reader should bear in mmd that after a record has been secured, analysis 
is required before the process of collecting data is complete 
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camera, dictaphone, and other recording instruments are 
included under the first head. Usually the operation of the 
instrument requires little techmcal skill, but not infrequently 
considerable ingenuity is required to devise an instrument 
adapted to the conditions under which the record must be 
secured The apparatus now used at the University of Chicago 
to secure a photographic record of the eye-movements of a 
reader is a good illustration.^ 

When a competent stenographer or stenotypist is not avail- 
able, a satisfactory record may be obtained by a person who 
takes a few notes and then soon afterwards expands them into 
a partial verbatim account. This method is generally employed 
by competent newspaper reporters, especially w^hen interview- 
ing persons whose time is limited. In an investigation reported 
by Helseth^ a record of recitations w’as obtained by having two 
or more observers take such notes as they could These notes 
were turned over to the teacher who expanded them into an 
approximately verbatim account of the recitation. Other cases 
have been reported by Horn,^ and Knudsen,^ and Stevens.® 
The record secured by an experienced and competent ste- 
nographer may be expected to be highly accurate, at least so 
far as the meamng of the record is concerned, but if details of 
diction are considered, a stenographic record will usually in- 
volve errors ® Obviously, a larger number of errors is to be 
expected when the record is an elaboration of notes taken by 
an observer For some purposes, however, accuracy in the details 
of a record is not essential 

^ See page 3 for references 

^Helseth, Inga OUa “Children’s Thinking,” Teachers College, Columbia 
University, Contributions to Education, No 209 New York Bureau of Publica- 
tions, Teachers College, Columbia University, 1926 163 pp 

3 Horn, Ernest “Stenographic Reports of Speyer School Lessons,” Teachers 
College Record, 16 33-40, January, 1915 

^ Knudsen, C W Evaluation and Improvement of Teaching Garden City, 
New York Doubleday, Doran and Company, 1932, pp 489-524. 

s Stevens, Romiett “ The Question as a Measure of Efficiency in Instruction,” 
Teachers College, Columbia University, Contributions to Education, No 48 New 
York Bureau of Publications, Teachers College, Columbia University, 1912 95 pp 
® Greene, H A , and Betts, E A “A New Technique for the Study of Oral- 
Language Activities,” The Elementary School Journal, 33 753-61, June, 1933 
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In descriptive recording, the human element is even more 
important An observer tends t‘o mterpiet what he observes 
For example, an observer may sec that certain symbols appear 
on the blackboard. He perceives, or observes, that these symbols 
are addition examples. His perception or observation has an 
inferential element since he has given moaning to his sensations 
Further inference may be made before the observation is 
recorded He may infer that the class present in the (ilassroom 
has been engaged in drill m addition, and further, he may formu- 
late and record some inferences with respect to the quality of 
this drill It is evident, therefore, that the skill of the observer 
and his experience are important factors m descriptive recording. 

The observer must identify what he should record. Check- 
lists, score cards, and other devices of this type are useful aids 
since they focus the attention of the observer on the items which 
are to be observed. Rating scales are an aid since they serve to 
control in some measure inferences of the cvaluat/ive type 
Without the aid of some device to focus his attention, an 
observer, especially an untrained one, is not likely to report a 
satisfactory description. The record is likely to be a mixture 
of statements descriptive of action or conditions observed, in- 
ferences relative to mental activity, and evaluations of aspects 
of the scene observed.^ Given a checklist or other suitable device, 
the record of a trained observer is likely to be rather highly 
reliable. An untrained observer working without a (checklist 
or other aid is not likely to secure a reliable record unless the 
items to be recorded are obvious.^ 


1 When a graduate group consisting mainly of experienced supervisors were 
asked to list observable characteristics indicative of the effectiveness of teaching, 
most of the items reported were not observable. Considerable study of the 
problem was necessary before the members of this group could distinguish be- 
tween observable characteristics and inferred action or evaluations Monroe, 
W. S “Observable Characteristics of Efficiency in Teaching,*’ Elementary 
School Journal, 27: 697-99, April, 1927. 

2 Reckless, W, C , and Smith, Mapheus “The Agreement of Three Observers 
after Practice in Simultaneous Recording of Behavior,” Journal of Applied 
Psychology, 18 635-44, October, 1934 In this study the nature of the observa- 
tions to be made was obvious from the general instructions The findings show a 
rather high degree of agreement. 
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When the presence of one or more observers is likely to affect 
the activity of which a record or description is desired, a tech- 
nique should be devised which will ehminate this difficulty 
This IS especially important in the observation of young children. 
Gesell and his associates at Yale Umversity have perfected a 
one-way vision screen for their study of the behavior of infants.^ 

A number of devices have been prepared for the purpose of 
securing records of the amount and character of pupil participa- 
tion in classroom activity. Usually these devices are seating 
charts of the class in which the squares representing the indi- 
vidual pupils are used as the spaces within which to record the 
symbols showing the extent and character of the pupil partici- 
pation Puckett devised symbols to indicate that a pupil asked 
a question, raised his hand once without being called on, raised 
his hand another time, was called on and made a good recitation, 
was called on once when he didn^t have his hand raised and 
made a single word response.^ In the device reported by Twitch- 
ell ® the symbols are recorded along lines drawn from pupils’ 
names spaced about the chart Some of the symbols refer to 
such items as. ^'called on and responded-verbally or otherwise 
(without hand raised),” “called on and gave no response or 
said, T don’t know,”’ “called on (without hand raised) for 
board work and responded with board work,” “volunteered 
to participate by raising hand and was called on.” Horn 
prepared a device which is somewhat simpler than those of 
Puckett and Twitchell, and which does not provide for record- 
ing so many details ^ 

1 Gesell, Arnold Infancy and Human Growth New York The Macmillan 
Company, 1928 418 pp 

The interested reader should consult also 

Weiss, A P *‘The Measurement of Infant Behavior,” Psychological R&mew, 
36 453-71, November, 1929 

2 Puckett, R C “Making Supervision Objective,” The School RevteWf 
36 209-12, March, 1928 

3 Twitchell, D F “ An Objective Measure in Supervision,” Journal of Educa-^ 
tional Research, 19 128-34, February, 1929 

^Horn, Ernest “Distribution of Opportunity for Participation among the 
Various Pupils in Class-Room Recitations,” Teachers College, Columbia Univer- 
sity, Contributions to Education, No 67. New York Bureau of Publications, 
Teachers College, Columbia Umversity, 1914, pp 4r-5. 
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Morrison ^ describes a technique for measuring group atten- 
tion. The observer takes a position where he can observe easily 
all of the pupils in a class The position recommended is the front 
of the room, but to one side so as to be out of the line of vision 
between the teacher and the pupils. The observer runs his eyes 
up and down each row of pupils and at minute intervals records 
the number of pupils who appear inattentive. The gioup atten- 
tion score IS obtained by dividing the number of actual minutes 
of pupil attention by the number of po.ssiblc minutes of pupil 
attention For example, if there are 30 pupils in a class and the 
class-period is forty-five nunutes long, the number of possible 
minutes of pupil attention is 30 X 45 = 1350. If the sum of 
the numbers of pupils inattentive for all of the minute intervals 
of the forty-five minute period is 86, then the actual minutes of 
pupil attention are 1350 — 86 = 1264 The group attention 
score is 1264 1350 = 936 or 94 per cent 

The reader mtorcstod in collecting data with respect to in- 
dividual attention, gioup attention, or individual-group atten- 
tion should consult an article by Blume in which he gives an 
excellent discussion of the techniques and thrc'c facsimiles of 
data sheets used ^ The reader will also find it profitable to 
study the discussions by Knudsen ■* and Gray ^ and the investiga- 
tions reported by Bjarnason® and Bridges.® In the writings by 

1 Morrison, H C The Practice of Tvaching in the ticcondaty t^chool* Chuuigo 
The University of Chicago Proas, 102G, pp 119~2(), 

2 Blume, C E “Techniques m the Measuring of Pupil Aitontion,” Scientific 
Method in Supemnsion, The Second Yearbook of the National (Jonferonco of Su- 
pervisors and Directors of Instruction Now York Bureau of Publications, 
Teachers College, Columbia University, 1920, pp 37-51. 

2 Knudsen, C. W. Emluahon and hnprowmeni of Tewhitig Garden City, 
New York Doubleday, Doran and Company, 1032, pp, 2()3"tSl 
See also Knudsen, C W, “A Program ol High School Supervision,” Peabody 
Journal of Education, 7 323-32, May, 1930 
^Gray, W S “Supervising Instruction in Reading/’ Scientific Method in 
Supervision, The Second Yearbook of the National C'onfercnce of Supervisors 
and Directors of Instruction New York Bureau of Publications, Teachers 
College, Columbia University, 1929, pp. 181-92 Gray presents a “Group Ap- 
plication and Attention Chart” designed by C. R Maddox 
®Biarnason, Lofter “Relation of Class Size to Control of Attention,” Ele* 
meniary School Journal^ 26 36-41, September, 1925, 

® Bridges, K M B “The Occupational Interests and Attention of Four- 
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Blume and by Eoiudsen evidence is presented to show that 
when the techniques are used with skill rather highly valid and 
reliable data are secured with respect to group attention 

When the activity of which a descnptive record is desired 
extends over a considerable period of time, observation is 
seldom feasible In such cases the persons involved may be 
asked to keep diaries For example, Flowers ^ sent a request 
to 170 principals of elementary schools to keep diaries of their 
daily work for a penod of two weeks Sixty-seven of the princi- 
pals complied with the request In using this technique it is well 
to restrict the record to activities that individuals are capable 
of observing with respect to themselves A principal may easily 
note in a diary the amount of time given to a conference with a 
parent He should be able to record, at least so far as the broader 
aspects are concerned, the nature of the conference. It is doubt- 
ful, however, whether the technique will secure dependable 
data when much introspection, or retrospection, is involved 
It is also probable that highly dependable records should not 
be expected from pupils when they are requested to keep dianes 
of their activities. 

6. Making estimates and ratings. Since estimates and ratings 
are subjective, a scale or score card is usually employed as a 
means of minimizing the effect of the subjectivity of the process 
A scale consists of a series of samples or descriptions of the thing 
to be rated arranged in order of ascending quality or merit 
Usually a numeiical value is assigned to each sample. This 
instrument is illustrated by the scales used for measuring 
compositions, handwriting, drawings, and the like A score 
card consists of a list of the characteristics with respect to which 
estimates are to be made. Usually the scale of points on which 
each characteristic is to be rated is given so that when the 
several ratings are combined they will be appropriately weighted. 

The ^'man-to-man^^ rating scale, devised in a seminar con- 

Year-Old Children,” Pedagogical Seminary and Journal of Genetic Psychology^ 
36 551-69, December, 1929 

^Flowers, I V “The Duties of the Elementary School Principal,” FJe- 
mentary School Journal, 27 414-22, February, 1927 
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ducted at the Carnegie Institute of Technology by W. D. 
Scott, now president of Northwestern University, is a useful 
device when no better technique is feasible. The rater selects, 
for example, the best teacher he ever knew and writes the name 
at the top of a sheet of paper. He next selects the poorest teacher 
he ever knew and writes his name at the bottom In between 
he writes the name of an average teacher, the name of a better- 
than-average teacher and the name of a poorer-than-average 
teacher. Numerical ratings such as 15, 12, 9, 6, and 3 may then 
be assigned to the five names listed Descriptive statements 
may be added as a means of assisting the rater to keep in mind 
the significant characteristics of the teachers of the scale. ^ 
Freyd ^ has developed a “graphic” rating scale for rating indi- 
viduals with respect to twenty traits such as neatness, physique, 
flexibility, and talkativeness. 

When employing a rating scale it is desirable to have several 
independent ratings made The average of the several ratings 
will be more reliable than a single rating Symonds ® has sug- 
gested that in general eight is the minimum number of ratings 
that should be obtained when the resulting measures are to be 
considered diagnostic of individuals.^ 

1 For discussions of the '‘man-to-man” rating scale, see 

Rugg, H. 0 “Is the Eating of Human Character Practicable?” Journal of 
Educational Psychology, 12 425-38, 485-501, November, December, 1921, 
13 30-42, 81-93, January, February, 1922 

Symonds, P. M. Measurement in Secondary Education Now York. The 
Macmillan Company, 1927, pp 341-43 

2 Freyd, Max “The Graphic Rating Scale,” Journal of Educational Psychoh 
ogy, 14. 83-102, February, 1923 

For a graphic scale which may bo used m rating students with respect to 
several attitudes, see 

Herriott, M E “Attitudes as Factors of Scholastic Success,” University of 
Illinois Bulletin, Vol 27, No 2, Bureau of Educational Research Bulletin, No. 
47. Urbana University of Illinois, 1929, pp 65-72 

» Symonds, op cit , p 354. 

^The graduate student or other research worker interested m rating as a 
means of measuring traits will find the following references helpful 

Brueckner, L J “Scales for the Rating of Teaching Skill,” Bulletin of the 
University of Minnesota, Vol 30, No 12. Minneapolis University of Min- 
nesota Press, 1927 28 pp 

Clark, W, W “Whittier Scale for Grading Juvenile Offenses,” California 
Bureau of Juvenile Research Bulletin, No. 11 Whittier, Calif Calif Bureau of 
Juvenile Research, Whittier State School, Apr , 1922 8 pp (Continued next page) 
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Score cards are useful instruments in making estimates since 
they tend to maintain the same basis of evaluation from one 
observation to another. A significant limitation lies in the 
fact that an essential element may be lacking m the thing ob- 
served, and yet the total score may be high. For example, in 
the Strayer-Engelhardt Score Card for City School Buildings, 65 
points are to be deducted from the perfect score of 1000 points 
if there are no provisions for fire protection. Hence, a building 
without fire protection but otherwise well planned and con- 
structed would receive a relatively high score that would tend 
to be misleading with reference to the actual status of the 
building for school purposes. While this limitation should be 
remembered it is not usually a serious one. 

In addition to score cards for the rating of school build- 
ings and equipment,^ and of janitorial and engineering serv- 


Cornell, E L , Coxe, W W , and Orleans, J S, Rating Scale for School Habits. 
Yonkers-on-Hudson, New York World Book Company, 1927. 

Haggerty, M E , Olson, W. C , and Wickman, E K. Behavior Rating Sched- 
vies. Scales for the Study of Behavior Problems and Problem Tendencies in 
Children Yonkers-on-Hudson, New York World Book Company, 1930 11 pp 
Haught, B F. “A Scheme for Combining Incomplete Rankings,” Journal of 
Applied Psychology, 7 168-72, June, 1923 

Hollingworth, H L Judging Human Character. New York: D. Appleton 
and Company, 1922 268 pp. 

Knight, F. B ” The Effect of the Acquaintance Factor upon Personal Judg- 
ments,” Journal of Educational Psychology, 14. 129-42, March, 1923 

Knight, F B , and Franzen, R H “Pitfalls in Rating Schemes,” Journal of 
Educational Psychology, 13 204-13, April, 1922. 

Olson, W C Problem Tendencies in Children, A Method for Their Measure- 
ment and Description Minneapolis University of Mmnesota Press, 1930 
90 pp 

Shen, Eugene “The Influence of Friendship upon Personal Ratings,” 
Journal of Applied Psychology, 9 66-68, March, 1925 

Shen, Eugene “The Reliability Coefficient of Personal Ratings,” Journal of 
Educational Psychology, 16 232-36, April, 1925. 

Symonds, P. M. “Notes on Rating,” Journal of Applied Psychology, 9 188- 
95, June, 1925 

Thorndike, EL “A Constant Error in Psychological Ratings,” Journal of 
Applied Psychology, 4: 25-29, March, 1920 

Wickman, E K Children’s Behavior and Teachers’ Attitudes New York 
The Commonwealth Fund, 1928 247 pp 

I The interested reader should consult the following references 
Anderson, C A “Tentative Score Card for Elementary School Desks and 
Seats,” Amerimn School Board Journal, 69 46-47, July, 1924 
Strayer, G D “Score Card for City School Buildmgs,” Fifteenth Yearbook of 
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ice/ a number have been prepared for rating such things as 
health habits, textbooks, teacher-characteristics, and supervisory- 
progress Recently a unique scoic caid has been prepared by 
Hall for the purpose of rating schools with respect to provisions 
for gifted children. “ The Chapman-Sims Socio-Economic Scale ® 

the National Society for the Studi/ of Education, Pari I. Bloomington, Illinois' 
Public School Publishing Company, 1916, pp 41-51. 

Strayer, G D , and Engelhaidt, N L Standards for High School Buildings 
New York Teachers College, Columbia University, 1924 95 pp 

Strayer, G D , and Engelhard!, N. L “Score Card for Village and Rural 
School Buildings of Four Teachers or Less,” Teachers College Bulletin, Eleventh 
Series, No 9, January 3, 1920 New York Teachers College, Columbia Uni- 
versity, 1920 22 pp 

Strayer, G D , Engelhard!, N L , and Elsbree, W S Standards for the Ad- 
ministration Building of a School System New York Bureau of Publications, 
Teachers College, Columbia University, 1927 40 pp 

1 Engelhard!, N L , Reeves, C E , and "Womrath, G. F. Score Card for 
Public School Janitorial-Engineermg Service New York Bureau of Publica- 
tions, Teachers College, Columbia Univeisity, 1926 6 pp 

2 Hall, J J “ How Does Your School Rato in Providing for Gifted Children*'’” 
Journal of Educational Research, 22 81-88, September, 1930 

3 Chapman, J C, and Sims, V M “The Quantitative Measurement of 
Certain Aspects of Socio-Economic Status,” Journal of Educational Psychology, 
16 380-90, September, 1925 

Earlier investigations attempting to measure socio-economic status include * 

Counts, G S “The Selective Character of Ameucan Secondary Education,” 
Supplementary Educational Monographs, No 19 Chicago University of 
Chicago Press, 1922 162 pp, 

Holley, C B “Relationships between Persistence m School and Home Con- 
ditions,” Fifteenth Yearbook of the National Sodcty for the Study of Education, 
Part II Chicago University of Chicago Press, 1916 119 pp 

Kornhauser, A, W. “The Economic Standing of Parents and the Intelligence 
of Their Children,” Journal of Educational Psychology, 9 159-64, March, 1918 

Van Denbmg, J. K “ Causes of the Elimination of Students m Public Second- 
ary Schools of New York City,” Teachers College, Columbia UnmrHity Contribu- 
tions to Education, No 47 Now York Bureau of Publications, Teachers Col- 
lege, Columbia University, 1911. 206 pp 

The following references will be of interest to the graduate student or other 
research worker in this connection* 

Clark, W. W , and Williams, J H “A Guido to the Grading of Neighbor- 
hoods,” Publications of Whittier State School, Department of Research Bulletin, 
No 8 Whittier, California. Whittier State School, 1919 25 pp 

Chapin, F S “A Quantitative Scale for Rating the Homo and Social En- 
vironment of Middle Class Families m an Urban Community A First Ap- 
proximation to the Measurement of Socio-Economic Status,” Journal of Educa- 
tional Psychology, 19 99-111, February, 1928. 

Moore, E S “The Development of Mental Health m a Group of Young 
Children,” University of Iowa Studies, Studies in Child Welfare, Vol IV, No 6 
Iowa City, Iowa, 1931 128 pp (A facsimile is given on pp 96-98 of the Iowa 
Child Welfare Research Station Scale for Eating of Homo Influences ) 

(Continued next page) 
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has been proposed as an instrument for measuring home en- 
vironment Heilman used a modification of this scale m his 
study of “Factors Determining Achievement and Grade 
Location ^ 

7. Selecting and administering tests. When adequately 
defined, the problem specifies the particular abilities or traits 
to be measured, and these specifications constitute criteria to 
be used in judging the validity of the test or tests bemg con- 
sidered. Determinations of validity have been made for some of 
the available tests, but a test that is highly valid for one pur- 
pose may be unsatisfactory when considered with reference to a 
different purpose ^ When no available test appears satisfactory 
for securing the data specified by the problem, the investigator 
should, if he is able to do so, construct his own tests. Curtis ® 

Terman, L M , and Goodenough, F L "Racial and Social Origin,” Genetic 
Studies of Genius, Vol I Stanford, California Stanford University Press, 
1925 (On pp 67-69 is reproduced the Barr Scale for Ratings of Occupational 
Status ) 

Van Alstyne, Dorothy "The Environment of Three-Year-Old Children,” 
Teachers College, Columbia University Contributions to Education, No 366. 
New York Bureau of Publications, Teachers College, Columbia University, 

1929 108 pp (Describes use of scale devised by Chapin ) 

Watson, Goodwin “A Scale for Rating Home Contributions to Personality 
Development of Children,” Baltimore Bulletin of Education, 8 177—79, May, 

1930 

Williams, J H "A Guide to the Grading of Homes,” Publications of Whittier 
State School, Department of Research Bulletin, No 7 Whittier, California: 
Whittier State School, 1918 21 pp 

1 Heilman, J D "Factors Determining Achievement and Grade Location,” 
The Pedagogical Seminary and Journal of Genetic Psychology, 36 435-57, 
September, 1929 For a briefer account see "The Relative Influence upon 
Educational Achievement of Some Hereditary and Environmental Factors,” 
Twenty-Seventh Yearbook of the National Society for the Study of Education, Part 
II Bloomington, Illinois Public School PublisLng Company, 1928, pp 35-65 

For a facsimile of the revised scale and a discussion of its construction and 
validation see Heilman, J D "A Revision of the Chapman-Sims Socio- 
Economic Scale,” Journal oj Educational Research, 18 117-26, September, 

1928 

2 Errors of validity and errors of measurement are considered in Chapter V 
Portions of Chapter VII, which deals with the problem of measurement, may 
also be read m this connection The selection of tests in experimental studies is 
considered in Chapter IX 

3 Curtis, F D "Some Values Derived from Extensive Reading of General 
Science,” Teachers College, Columbia University, Contributions to Education, No 
163 New York Bureau of Publications, Teachers College, Columbia Uni- 
versity, 1924 142 pp 
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and Irion ^ are examples of research workers who have devised 
measuring instruments for dealing with their problems 

The investigator should also consider the cost of the test 
materials and the time required for administration and for 
scoring the test papers In some types of studies, the availability 
of norms and of duplicate forms must also be considered 
The test or tests selected should be carefully admimsteied. 
If comparisons are to be made with established norms, the 
published directions for administering must be followed care- 
fully. When such comparisons are not to be made, the directions 
may be modified, but any changes should be made intelligently, 
and it IS usually important that the directions be the same for 
all groups of pupils tested 

Application, of basic techniques. The preceding exposition 
of the basic techniques does not adequately reveal the difficul- 
ties encountered in collecting data in educational research In 
addition to overcoming those inherent in the basic techniques, 
there are other problems In experimental studies, an important 
phase of the process of collecting data is setting up and conduct- 
ing the experiment ^ Sometimes an investigator cannot obtain 
the data called for by the problem or it is not feasible for him 
to secure them and he must resort to indirect measurement. 
For example, certain problems relating to silent reading call for 
measures of the mental processes involved A substitute for this 
information is secured by measuring the eye-movements of the 
reader In studies of the supply and demand for teachers, the 
number of teachers available within a specified area cannot be 
determined directly ® Frequently it is not possible, or at least 
not feasible, to collect data for the entire population specified 
by the problem. For example, m a study to determine the value 
of certain measures for predicting teaching success, the problem 

1 Irion, T W H “Comprehension Difficulties of Ninth-Grade Students in 
the Study of Literature/’ Teachers College, Columbia University, Contributions to 
Education, No. 189 New York Bureau of Publications, Teachers College, 
Columbia University, 1925 116 pp 
® The procedure of experimentation is described in Chapter IX. 

® For an illustrative study of the supply and demand of teachers see ref- 
erence 53 at end of Chapter VIII 
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may call for measures of teaching success for all students who 
enter teacher training institutions. Of course, such data can 
be obtained for only those who actually become teachers and 
even m the case of this population, measures of teaching success 
may be difficult or impossible to obtain for all members because 
the persons involved are scattered over a wide area. When data 
for only a portion of the population are secured, the depend- 
ability of the findings is a function of the representativeness of 
the sample studied. Hence, the mvestigator is confronted with 
the problem of obtaining a sample that is representative or one 
whose degree of representativeness is known 

When a universe is stratified, that is, consists of relatively 
homogeneous units or groups, a carefully selected sample from 
each of the several strata is likely to be highly representative 
of the total universe For example, if it is desired to sample 
pupil achievement m the schools of a large city or of a state, the 
first step would be to classify the schools on the basis of size or 
other appropriate criteria The admimstration of the test to a 
few wisely selected classes within each group will be likely to 
result in a highly representative sample of the total population. 
When an investigator does not have an opportunity to select a 
stratified sample, he may be able to present evidence to show 
that his sample is highly representative For example, if data 
have been obtained from the fifth-grade pupils in a given city, 
he may be able to show that this group of pupils is highly repre- 
sentative of fifth-grade pupils in general by presenting evidence 
of their general intelligence, chronological age, and other factors 
which may be expected to influence school achievement. 

Occasionally a more mechanical system of sampling may be 
employed For example, the students enrolled in a large school 
may be sampled by taking every fiftieth record from an alpha- 
betical file The defined words m a dictionary may be sampled 
by taking the first defined word on the odd numbered pages. 
The vocabulary of a textbook may be sampled by taking the 
words m certain lines on every fifth page. Lively and Pressey ^ 

1 Lively, B A , and Pressey, S L “A Method for Measuring the ‘Vocabulary 
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have proposed a method of obtaining a thousand-word sample of 
the vocabulary of textbooks The procedure of sampling is 
based upon an estimate of the number of pages which must be 
sampled, taking one line per page in order to cover one thousand 
words For example, a book of five hundred pages with an 
average of ten words per line would be sampled by taking a 
specified line on every fifth page throughout the book Patty 
and Painter ^ recommend samples that are proportional to the 
length of the books analyzed 

If the procedure of selection is such that the sample is deter- 
mmed merely by chance, it is called random, A random sample is 
not necessarily representative but any non-representativeness is 
due to the operation of chance Frequently in educational re- 
search it is difficult to demonstrate that the conditions of random 
sampling ^ have been satisfied They are likely to be approxi- 
mated when an alphabetical file is sampled by taking records at 
regular intervals ^ Critical inquiry has revealed that in a number 
of cases procedures presumed to result m a random sample have 
failed to do so For example, if the defined words m a dictionary 
are sampled by taking the last defined word on each page, 
Williams ^ has shown that the sample thus obtained is not ran- 
dom when judged with respect to children's vocabularies. The 
more commonly used words, whose definitions in general occupy 
more space than those of unusual words, are more likely to be 
selected Hence, a sample obtained by taking the la^st defined 
word on pages of a dictionary is biased in favor of the more 

Burden’ of Textbooks,” Educational Administration and Superomont 9 389- 
98, October, 1923 

1 Patty, W W , and Pamter, W I. '‘A Technique for Moasurmj? the Vocabu- 
lary Burden of Textbooks,” Journal of Educational Research, 24. 127-34, 
September, 1931 

2 For a statement of these conditions see Yule, G U An Introduction to the 
Theory of Statistics, Eighth Edition. London. Charles Grifhn and Company, 
1927, pp. 259 f 

® For an illustration of this method of random sampling and evidence of the 
representativeness of the samples secured, see Wood, Ben D. “The Reliability 
of Predictions of Proportions on the Basis of Random Sampling,” Journal of 
Educational Research, 4 390-95, December, 1921 

* Williams, H M “Some Problems of Sampling in Vocabulary Tests,” 
Journal of Experimental Education, 1, 131-33, December, 1932, 
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commonly used words Ward ^ has criticized the method of 
sampling proposed by Lively and Pressey when used as a means 
of measuring the vocabulary burden of textbooks. Dolch con- 
tends that any process of sampling fails to reveal the repetition 
of words and hence operates to make the book appear more 
difficult than it really is, because the repetition of words in a 
text tends to reduce its vocabulary difficulty ^ Walker ® de- 
scribes a number of sampling procedures in which the conditions 
of random sampling are not satisfied The sampling that usually 
occurs when a questionnaire is employed in collecting data is 
hkely not to be random. It appears, therefore, that random 
sampling is not often possible in educational research. A sample 
should not be treated as random unless convincing evidence can 
be cited in support of this characteristic 

A practical question in sampling concerns the size of the 
sample that may be considered satisfactory When the sampling 
is random, the probable error formulae provide a means for 
determining the number of cases necessary to reduce the prob- 
able error of the statistics due to random sampling to any spec- 
ified limit In certain types of situations the representativeness 
of the sample may be estimated by noting the effect of the 
addition of sub-samples Suppose a large number of pupil 
compositions have been accumulated and it is desired to ascer- 
tain the types of language errors and their relative frequencies. 
The investigator may select a series of relatively small random 
samples and note the changes in his tabulations as additional 
samples are analyzed When he finds that no new types of error 
are being added and the relative frequencies are approximately 
constant, he is justified m assuming that the total sample is 


1 Ward, J. L “Measuring ‘Vocabulary Burden,”’ American School Board 
Journal, 71 98, September, 1925 

2 Dolch, E W. “Sampling of Reading Matter,” Journal of Educational 
Research, 22 213-15, October, 1930 

3 Walker, H M, and Students “The Sampling Problem in Educational 
Research,” Teachers College Record, 30 760-74, May, 1929 

See also Olson, W C , and Cunningham, E M “Time-Sampling Techniques,” 
Child Development, 5 41-58, March, 1934 A bibliography of 76 items is in- 
cluded. 
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large enough to be highly representative. Johnson and Enrich ^ 
have reported a study in which they show that a randona sample 
including 30 per cent of the cases yielded satisfactory results 
in a certain situation Although one should not generalize from 
a single investigation, it is hkely that a random sample of this 
size will usually be satisfactory In some cases, a smaller 
random sample may be satisfactory 

This discussion of the application of the basic techniques 
suggests only a few of the difficulties that investigators actually 
encounter in collecting data. The particular difficulties are so 
varied that a complete treatment is not feasible in this volume. 
A person, especially one inexperienced in educational research, 
who contemplates the study of a problem, should consult re- 
ports of similar researches with respect to the techniques em- 
ployed in collecting data The bibliography of survey investiga- 
tions at the end of Chapter VIII and that of experimental 
studies at the end of Chapter IX are recommended to readers 
who are interested in extending their study of collecting data 
in educational research 

1 Johnson, D A, and Eurich, AC “An Empirical Test of Sampling,” 
Journal of Experimental Education, 3 174r-79, March, 1935 
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ELEMENTARY TECHNIQUES FOR HANDLING 
DATAi 

A. Scales and Calculations 

The meaning of numerical data. Numerical data consisting 
of counts of things such as the members of a class, full-time 
teachers employed, books in libraries, and the like are to be re- 
garded as exact The scale of measurement on which such meas- 
ures are expressed is called discontinuous or discrete to desig- 
nate that it has values at only the points designated by integers. 
Test scores and most other types of data dealt with m educa- 
tional research are expressed on a continuous scale of measure- 
ment Given sufficiently precise measuring instruments, there 
is no reason why test scores and other such measures may not 
be expressed to any desired number of decimal places In 
practice, however, such data are usually expressed as integers 
In most cases the integer employed is the one marking the lower 
limit of the unit division of the scale in which the exact value 
of the measure lies For example, a test score of 27 means that 
the exact measure is not less than 27 0 and is not as much as 
28 0 In other words, although the score is given as 27, the 
exact value may be anywhere between 27 0 and 28 0. Occa- 
sionally, measures are expressed m terms of the nearest division 
point of the scale For example, a measure of the quahty of 
handwriting expressed as 8 means that the exact measure of 
the quality may be anywhere between 7 5 and 8 5. In both 
cases the data are to be regarded as only approximate, 

1 Other statistical techniques are described in later chapters in connection 
with the consideration of types of problems to which they are applicable. The 
reader may locate these techniques by employing the topical index at the end 
of the volume 
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The scale of measurement on which teachers’ salaries are 
expressed is theoretically continuous, but m practice it is dis- 
continuous Salaries are usually round numbers such as $900, 
$1000, $1080, $1850, and the like A teacher is seldom, if ever, 
paid a salary of $1203 42 Hence, teachers’ salaries are to be 
regarded as exact measures, although the scale of measurement 
is theoretically continuous 

The number of decimal places or significant figures in the 
results of calculation. The fact that much of the data with 
which we deal m educational reseaich are only approximate 
raises the question of the number of decimal places or significant 
figures to be retained m the results of calculation The situa- 
tion is complicated by the presence of errors and other data 
faults which are to be described in Chapter V, but it will be 
helpful to consider the question meicly on the basis of the 
nature of approximate measures In the case of the addition 
or subtraction of approximate numbers the sum or difference 
should include only as many decimal places as appear in the 
least precise of the items For example, if 3 9, 8.475, and 1 2846 
are to be added, the addends should be rounded off to one 
decimal place or the vsuni should be rounded off to one decimal 
place Writing the sum with a larger number of decimal places 
will tend to give a false impression of its precision If the mean 
is calculated by dividing the sum by the number of items, it 
may be expressed to one additional decimal place. If the num- 
ber of items IS relatively large, say 50 or more, a second addi- 
tional decimal place is appropriate When a mean is calcu- 
lated from a frequency distribution, more conservative rules 
should be followed 

In the case of multiplication or division, we are concerned 
with the number of significant figures rather than the number 
of decimal places. Zeros employed merely to locate the decimal 
points are not considered significant. For example, each of the 
measures 742, 7.42, ,0742, and 000742 is described as having 
three significant figures A zero between two other figures is 
counted in determining the number of significant figures. A 
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zero introduced in rounding off a number is not counted as 
significant. For example, if the population of the city is given 
as 12,500, meaning thereby that the exact number of inhabitants 
is nearer 12,500 than 12,400 or 12,600, the number of significant 
figures IS three. If, however, the population is known to be 
exactly 12,500, the number of significant figures is five. As a 
means of indicating that an integral number ending in one or 
more zeros is to be considered exact, a decimal point may be 
placed after the last zero Thus 12,500 would be understood 
as an exact number. 

In multiplication or division of approximate numbers, the 
general rule is that the number of significant figures in a product 
or quotient should not exceed that of the measure having the 
smaller number of significant figures. For example, if the diam- 
eter of a cross-section of a tree has been measured as 14 inches 
and the circumference is calculated by multiplying by 3 1416, 
the result should be given as 44 and not as 43 9824 The items 
should not be rounded off before the multiplication or division 
IS performed and in the case of division the quotient should 
be calculated to two places more than are to be retained 

A square root may be carried to as many significant figures 
as there are in the number. The standard deviation ^ may be 
calculated to one or two decimal places beyond the number 
appearing in the measures For example, if the measures are in 
terms of integers, the calculation of the standard deviation 
should be carried to one or two decimal places In accomplish- 
ing this, it is advisable to carry the calculations under the 
radical to four places. 

The reader should bear in mind these rules are derived from 
the nature of approximate measures If data faults, which are 
dealt with in Chapter V, are also considered, more conservative 
rules would be indicated. Since data faults vary, it is not pos- 
sible to state specific rules, but the investigator should restrict 
the number of decimal places or significant figures so that the 
precision of the calculated statistic will not be grossly misrepre- 

1 See page 75. 
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sented. In some cases one may be guided by prevailing practice. 
Coefficients of correlation, are usually expressed to two decimal 
places and conformity with this practice is appropriate When 
the results of calculations are listed in a table, items of the same 
type should have the same number of decimal places. 

Computational aids. The arithmetical calculation involved 
in handling numerical data frequently requires much time. The 
research worker may, if his resources permit, employ clerical 
assistance for this routine work, but, in order to insure accuracy, 
the calculations must be checked This checking requires addi- 
tional time Hence, it is desirable to utilize such aids as may 
be available. 

Tables of products ^ are useful for securing products and 
quotients. Tables of logarithms, familiar to one who has studied 
trigonometry, are useful in multiplying, dividing, raising to a 
power, and extracting roots ^ Tables of squares ® are useful in 
the calculation of the coefficient of correlation In calculating 
probable errors and coefficients of partial correlation, tables 
giving values of 1 — and Vl — facilitate computation ^ 
The tables prepared by Pearson ® may be recommended for 
advanced statistical work. The handbook by Dunlap and 
Kurtz ® IS a very useful tool. It contains several tables, includ- 

1 A L. Crelle*a Calculating Tables Berlin* Walter do Gruyter and Co , 1919 
(New edition by 0, Sceliger ) These tables give all products up to 999 times 999. 

Peters, J Neue Rcchentafdn fdr Multi phkatwn und Division* Berlin Druch 
und Verlag G Reimer, 1909 These tables give all products up to 99 times 
99,999 

2 For an explanation of how to use logarithms, see Holzmgor, K. J Statistical 
Methods for Students in Education Boston Ginn and Company, 1928, pp 47-64 

® For example, Barlow^ s Tables of Squares, Cubes, Square Roots, Reciprocals of 
All Integer Numbers up to 10,000. New York Spon and Chamberlain, 1927, 

200 pp 

^ Holzmger, K J, Statistical Tables for Students in Education and Psychology 
Chicago University of Ch icago P ress, 1925 

Miner, J. E Tables of dl — r® and 1 — r^ for Use in Partial Correlation and in 
Trigonometry Baltimore The Johns Hopkins Press, 1922, 49 pp 

s Pearson, Karl Tables for Statisticians and Biometncians Cambridge Cam- 
bridge University Press, 1914. (Second edition, 1924, First edition, Vol 2, 1931.) 
143 pp 

® Dunlap, J W , and Kurtz, A K Handbook of Statistical Nomographs, Tables, 
and Formulas Yonkers-on-Hudson, New York* World Book Company, 1932. 
163 pp 
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ing squares, square roots, reciprocals and logarithms of num- 
bers; and a number of nomographs ^ useful in the calculation 
of certain statistics, such as standard error of a mean, intelli- 
gence quotient, and bi-serial coefficient of correlation. It also 
lists practically all of the formulae used in the statistical 
treatment of educational data 

Arithmetical calculations may be accomplished by means of 
machines The Monroe and the Marchant are perhaps the most 
useful. Although practice is required for skillful use of a cal- 
culating machine, the manipulations for the several arithmetical 
operations are easy to learn and the procedure for calculating 
a given statistic will soon be memorized When an appropriate 
machine is available, much time can be saved by using it. In 
some cases the maximum economy of time requires a formula 
adapted to the machine See page 94 for an illustration. 
The Hollerith Machine is useful in tabulation and classification 
when the data are extensive It may also be used in calculating 
coefficients of correlation ^ and in solving regression equations ^ 
A slide rule may be used when a high degree of accuracy is not 
required but its principal use is in checking calculations ac- 
complished by other means 

B Statistics of Frequency Distributions 

Summarizing a group of numerical data. When an investigator 
has a collection of data such as test scores, teachers' salaries, 
number of volumes in certain school libraries, and the like, he 
usually calculates the arithmetical mean (average) or median 

^ A nomograph is a graphical device, somewhat similar in principle to the slide 
rule, from which the results of a series of calculations can be read The readings 
are only moderately precise The construction of a nomograph has been de- 
scribed by Griffin in the following reference It includes an annotated bibliog- 
raphy of 73 references 

Griffin, Harold D “ How to Construct a Nomogram,” Journal of Educalwnal 
Psychology, 23 561-77, November, 1932 

2 Mendenhall, R M , and Warren, R “Computing Statistical Coefficients 
from Punched Cards,” Journal of Educational Psychology, 21 53-62, January, 
1930 

3 Segel, David “ The Automatic Prediction of Scholastic Success by Using the 
Multiple Regression Technique with Electric Tabulating and Accoimtmg 
Machines,” Journal of Educational Psychology, 22 139-44, February, 1931 
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as a central tendency or summary measure.^ The standard 
deviation or other measure of variability furnishes a fuither 
description of the group of data These measures may be calcu- 
lated directly from the data For example, if X is used to rep- 
resent the items of data, N the number of items, and S (capital 
sigma) the summation or addition of the items, the procedure 

for calculating the mean may be expressed by M (mean) = 

(See page 76 for calculation of the standard deviation ) Usu- 
ally, however, it is desirable to tabulate the measures m a fre- 
quency distribution When this is done, the mean, standard 
deviation, and similar measures are usually calculated from the 
frequency distribution. This procedure involves an assumption 
that will be noted later. Unless this assumption is approximated, 
the result obtained will be likely to differ from that secured by 
direct calculation from the items 

Constructing a frequency distribution. A frequency distribu- 
tion consists of a scale of intervals and of the number of meas- 
ures or items of data falling in each interval. The first step in 
consti noting a frequency distiibution is that of dividing the 
scale from zero or from a convenient point below the lowest 
measiue to a point at or slightly above the highest measure 
into a convenient number of equal mtcivals or steps ^ There is 
no standard number of intervals, but usually an effort is made 
to have not fewer than ten nor more than twenty-five The 
choice of the end-points of the intervals should be guided by 
three principles In the first place they should be consistent 
with the nature of the data. If the measures are approximate 
and expressed in terms of lower limits of the unit divisions of 
the scale of measurement, the end-points should be at unit 

1 The mode (see page 74) is sometimes calculated. Certain situations require 
the use of the geometric mean defined by the formula log (GM) ^ S log X or 

the harmonic mean defined by ^ 

2 Occasionally frequency distributions are formed with unequal intervals, 
especially at the extremes This, however, is seldom done unless the data are of 
such a nature as to make it desirable 
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division points of the scale. For example, if 17 means at least 
17, but not as much as 18, the precise limits of the interval 
15 to 19 are 15 00 and 19.99 . . If the measures are expressed 
in terms of the nearest integer, for example, if 17 means a value 
falling between 16 50 and 17 49 . . the lower limit of the in- 
terval is still written 15 and the upper one 19. The precise 
limits, however, are 14 50 and 19.49 . . ^ The second prin- 
ciple is that the choice of the end-points should facilitate 
tabulation. End-points that are multiples of 5, 10, or 100 are 
likely to be more convenient than ones with other endings. 
Finally, the end-points should be chosen so that the average 
values of the measures in the intervals will approach as nearly 
as possible the mid-points of the respective intervals 

The intervals may be designated by writing their lower lim- 
its as in Table I, but, if desired, the upper limits may also 
be designated The tabulation sheet is prepared by writing the 
scale intervals at the left side preferably using ruled paper 
If the measures to be tabulated are on separate sheets or cards, 
they may be sorted according to scale intervals and the number 
of measures in each pile counted and recorded in the frequency 
column of the distribution If sorting is not feasible or is not 
considered desirable, the measures may be tallied as shown in 
Table I In tallying, each group of five is represented by four 
downward strokes, 1 1 1 [, and a fifth diagonal stroke, \ This 
device aids in counting up the frequencies The final step is 
to write the frequencies opposite their respective scale intervals 
Generally it is desirable to copy the scale intervals and their 
frequencies on another sheet The frequency distribution should 
be given an appropriate caption as a means of labeling it 

For some purposes a cumulative frequency distribution is 
useful. It IS formed by totaling the frequencies for the suc- 
cessive intervals Thus, in Table I the cumulative frequencies 
would be 1, 4, 8, 16, 20, etc 

Graphical representation of frequency distributions. Many 
persons are aided in comprehending a frequency distribution by 

^ See footnote on page 69 
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having it represented graphically The scale of the distribution 
is laid off on a horizontal line and the frequencies are repre- 
sented by vertical distances Graphical representation is treated 
briefly in Chapter VIII For further details the reader should 
consult a text that deals with this topic 
Describing a frequency distribution. A frequency distribution is 
described by specifying its shape and determining its central tend- 
ency and a measure of its variability or spread The frequency 
distributions of large unselected groups of educational data ap- 
proach the normal shape and unless a statement to the contrary is 
made, an approximation to a normal distribution is understood. 
The mean or the median is usually calculated as the central tend- 
ency. The standard deviation and the median deviation (prob- 
able error) are the most frequently used measures of variability. 


Table I Showing the Tabulation op a Frequency Distribution 


Scale Intervals 

Tallies 

Frequencies 

475 

1 

1 

450 

1 

1 

425 

1 

1 

400 


0 

375 


0 

350 


0 

325 

III 

3 

300 

III 

3 

275 

III 

3 

250 

III 

3 

225 

mj III 

' 8 

200 

rm mj 

10 

175 

rnj m4 III 

13 

150 

nil rw mj 

15 

125 

nil 

4 

100 

nil III 

8 

75 

III! 

4 

50 

Ill 

3 

25 

1 

1 

0 



N » 


81 


Calculating the mean of a frequency distribution. The first 
step is to assume a mean. This may be chosen at any point, 






ELEMENTAEY STATISTICAL TECHNIQUES 69 


but the calculation will be reduced to a minimum if the assumed 
mean is chosen at the mid-point of the interval in which the 
true mean falls In Table II the mean has been assumed to fall 
at the mid-point of the interval from 200 to, but not including, 
225, or at 212 5.^ The eight scores m the interval 225 to, but 
not including, 250, are assumed to be uniformly distributed 
over it and hence “on the average” are 25 units, or one scale 
interval, above the assumed mean. In order to reduce the cal- 
culations to a minimum, this deviation is called one interval. 
In the same way the deviations of the other scores from the 
assumed mean are expressed in terms of intervals A negative 
deviation means that the scores fall in an interval below the 
assumed mean. The deviations are given in the third column 
of the table. In the fourth column of the table the products of 
the deviations and frequencies are recorded The sum of the 
positive products is 80, and the sum of the negative ones is 
— 132. The difference is —52. This is divided by the total of 
the frequencies, which gives a quotient of — .642 of an interval 
Since the interval is 25 units, this quotient is multiplied by 25 
to find the correction in terms of units This correction added 
algebraically to the assumed mean gives the true mean of 
196 45 2 

By employing symbols ^ the procedure just described may 


1 The mid-pomt of a scale or class interval depends on the meaning of the 

units employed When a measure such as a score on a spelling test means, for 
example, 16 up to but not including 17, the mid-point of the interval 15-20 
(more precisely 15 00-19 99 . .) is 17 5 If, however, a score of 16 means ex- 

actly 16, or means 15 50 to 16 49 . . , then the interval 15-20 (more precisely 
14 50-19 49 . ) has as its mid-point 17 Most measures of achievement and 

intelligence are interpreted in the same fashion as spelling scores Quality of 
handwriting, merit of compositions, and weights of children are usually ex- 
pressed in terms of the nearest integer and the second procedure should be 
employed 

2 In order to avoid giving a false impression of precision this result should be 
recorded as 196 5 See page 62 

3 A group of measures is commonly represented by the symbol X. When there 
are two or more groups of measures subscripts are usually employed but occa- 
sionally Y IS used to designate a second group When the measures are expressed 
as deviations from the mean of the group, small letters are used as symbols 
For further exposition of statistical symbohsm and a list of symbols see Ap- 
pendix. 
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Table II Illustrating the Calculation of the Mean from a 
Frequency Distribution 


Scale Intervals 

Frequency 

/ 

Deviation in 
Intervals 
x' 

fx' 

175 

1 

11 

\ 11 

430 

1 

10 

10 

425 

1 

9 

9 

400 


8 


375 


7 


350 


6 


825 

3 

5 

15 

300 

3 

4 

12 

275 

3 

3 

9 

250 

3 

2 

6 

225 

8 

1 1 

8 

200 

10 

0 

80 

175 

13 

-1 

-13 

150 

15 

~2 

-30 

125 

4 

-3 

-12 

100 

8 

-4 

-32 

75 

4 

~5 

-20 

50 

3 

“6 

-18 

25 

1 

-7 

- 7 

0 



-132 

N 

81 




Assumed mean 212 50 

Correction —16 05 

True mean . , . 196 45 


Correction = % 


N 


= 25 - - g j p- = 25(- 642) 
= -16 05 


be designated by means of a simple formula Let M represent 
the mean to be calculated, M' the assumed mean, x' the devia- 
tions of the intervals from the assumed mean,^ / the frequency 
of the measures in an interval, and i the width of an interval. 

I The symbol d' is sometimes used for this purpose 
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Then 


M + i 


S/a;' 


N 


In this formula S (capital sigma) indicates the sum of the vari- 
ous products formed by multiplying the frequency of an inter- 
val (/) by its deviation (a;') The number of cases or measures 
is designated by N. 

This method of calculating the mean from a frequency distri- 
bution assumes that the average value of the measures within 
an interval falls at the mid-point of the interval. When this 
assumption is not satisfied or departures from it are not neutral- 
ized, the calculated mean will be affected The assumption is 
not fully satisfied in a normal distribution if the intervals are 
large. The means of the measures of the several intervals will 
be slightly nearer the center of the distribution than the corre- 
sponding mid-points, but the effects will be neutralized. When 
the distribution is not normal, non-confornnty with the as- 
sumption IS likely to affect the results of the calculation. If 
the non-conformity is marked, the error may be distinctly 
significant, especially when the grouping is coarse For example, 
teachers' salaries are usually concentrated at certain points 
If the points of concentration are not near the middle of the 
intervals chosen, the calculated mean may be significantly in 
error. It has been suggested that when N is less than 50 and 
the width of the interval is greater than one, calculation should 
be made from the data. 

It should be noted that when the measures are expressed in 
terms of the lower hmits of the unit divisions of the scale of 
measurement, the mean calculated by dividing the sum of the 
items by their number will be too small. When the mean is 
calculated from a frequency distribution, the fact that the 
measures are expressed in terms of their lower limits is taken 
into account and a more accurate value will usually be obtained. 
‘The mean calculated directly from the scores tabulated in 
'Table II is 196 04. The difference between this value and the 
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mean calculated from the frequency distribution is probably 
typical of the difference to be expected m such cases. If the 
measures are expressed m terms of the nearest division point 
of the scale of measurement, the calculation of the mean by 
adding the measures and dividing the sum by the number of 
items will give an accurate result. The mean of such measures 
calculated from a frequency distribution will be slightly too 
large unless the limits of the intervals are properly chosen or 
the nature of the measures is recognized in the value assigned 
to the assumed mean. 

Occasionally the research worker is confronted with the neces- 
sity of calculating the mean from a frequency distribution whose 
intervals are not equal in width When this condition prevails, 
it is only necessary that the deviations of the mid-points of the 
several intervals from the assumed mean be correctly expressed 

Calculating the median. As the term implies, the median is 
the point on the scale on each side of which one-half of the meas- 
ures fall ^ Sometimes this term is used in the sense of mid- 
measure^ which IS defined as the middlemost measure of a series 
of measures arranged in ascending or descending order of mag- 
nitude If the number in the series is odd, the mid-measure 
is that one above and below which there is an equal number 
of measures If the number of measures is even, the mid-measure 
is taken as the average of the two mid-most items. If the scale 
of measurement is continuous the meaning of the numerical 
expressions of the measures should be recognized For example, 
if the mid-measure is the third of five expressed as 17, its value 
should be 17 5 rather than 17. 

The median of a frequency distribution is calculated by means 
of the following formulae which serve as a check on each other: 



^ If considered critically in the case of a limited number of measures, this 
definition is indefinite For practical purposes, the most precise definition is 
expressed by the formulae for calculating the median. 
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Md = u 



% 


The symbol I refers to the lower limit of the class interval 
in which the median is estimated to fall; u is the symbol for 
the upper limit of this class interval; Si designates the sum of 
the measures helow the lower limit of the class interval in which 
the median is estimated to fall; and Suj the sum of the measures 
above the upper limit of this interval The symbol ^ refers to 
the width of the scale interval and the symbol / designates the 
number of measures in the class interval within which the median 
is estimated to fall. The use of these formulae is illustrated by 
the following calculation of the median of the distribution 
given in Table II. 


^-35 

Md = 175 + — X 25 = 185.58 


81 


- 33 


Md = 200 - 


13 


X 25 = 185.58 


The choice of the scale interval 175 up to, but not including 
200, is made as the result of inspection Approximately 40 meas- 
ures must be above and below the median. Hence, counting in 
from either end soon reveals that the median is in the interval 
chosen. 

Other points in frequency distributions. Other points on the 
scale of a frequency distribution may be obtained with slight 
adaptation of the formulae given above. Qi, the first quartile, 
the point above which are three-fourths of the measures and 
below which are one-fourth of the measures, can be obtained 

by inserting, respectively, ^ and ^ for ^ in the formulae and 

modifying the meaning of I and u accordingly. Qa, the third 
quartile, the point below which are three-fourths of the measures 
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and above which are one-fourth of the measures, can be obtained 

3N N 

by inserting, respectively, -j- and The decile points, divid- 
ing the scale in tenths, may be obtained by substituting in the 
first formula and These 

N . 

computations may be checked by substituting for in the sec“ 

10 ' 10 ’ 10 ’ 10 ' 10 ' 10 ’ 10 ' 10 ’ 10 ’ 
points may be secured by similar substitution of appropriate 
fractions For example, the 63 percentile point would be ob- 

63A^ 

tamed by substituting in the first formula 
S7N 


, « , uiy cjiY 

ond formula - 777 ? tpt? 


100 


Percentile 


and in the 


second 


100 


The mode. The mode is occasionally used in describing a 
frequency distribution. In a simple series of measures, the mode 
is the most frequently occurring measure. In a frequency dis- 
tribution, the crude mode is defined as the scale interval which 
has the largest frequency This measure of central tendency, 
however, is obviously unsatisfactory because it depends upon 
the grouping of the measures made in forming the frequency 
distribution. Several formulae have been proposed for calcu- 
lating a modal point, but they are seldom used ^ 

Calculating measures of variability of a frequency distri- 
bution. A mean or median represents only the central tendency 
of a frequency distribution Since such assemblages of data 
differ in the spread or variability, a measure of this character- 
istic is a useful supplementary description The range is an 
easily understood measure of variability. It is the distance on 
the scale of measurement from the lowest to the highest measure. 
A single extreme case may influence the size of this measure 
unduly Hence, the 10-90 percentile range, Dio- 90 , is fre- 


1 Five formulae for calculating the mode are given by Dunlap, J W , and 
Kurtz, A. K Bandhook of Stahsiical Nomographs, Tables, and Formulas Yon- 
kers-on-Hudson, New York World Book Company, 1932, p 105 
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quently used in place of the total range It is obtained by 
subtracting the value of the tenth percentile point from that 
of the ninetieth percentile point The quartile deviation^ 
is obtained by subtracting Qi from Qz and dividing the differ- 
ence by 2 When this distance is measured off on both sides of 
the median, the space thus defined includes 50 per cent of the 
measures, provided the distiibution is normal in shape An- 
other measure of variability is the average deviation (AD) 
As the term suggests, it is the mean of the deviations from the 
median or mean, the sign of the deviations being disregarded 
It, however, is not very frequently used ^ 

The most commonly used measure of variability is the stand- 
ard demotion which is the square root of the mean of the squares 
of the deviations from the mean ^ This statistic is generally 
represented by the symbol ^ a When the measures are not 
grouped in a frequency distribution, the calculation is indicated 
by the formula 



For a frequency distribution, it is defined by the formula 



In this formula S (capital sigma) designates the summation of 
all products of / and and the width of the intervals or steps 
of the frequency distribution 

Table III illustrates the calculation of the standard deviation 
from the frequency distribution. The procedure, up to a cer- 


1 For the procedure to be used m computing the average deviation see any 
standard statistical text 

2 This IS the original definition of the standard deviation and the one generally 
recognized Unfortunately, some writers have defined it m other terms For a 
discussion of the various definitions, see Eells, W C “A Plea for a Standard 
Definition of the Standard Deviation,” Journal of Educational Research, 13 45~ 
52, January, 1926 

This symbol is the small Greek letter “sigma ” Occasionally SD is used 
to designate standard deviation, but this practice is not recommended 
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Table III Illustrating the Calculation of the Standard 
Deviation of a Frequency Distribution 


SC\LB 

Intervals 

f 

x' 

fx' 

f{xy 

475 

1 

11 

11 

121 

450 

1 

10 

10 

100 

425 

1 

9 

9 

81 

400 


8 



375 


7 



350 


6 



325 

3 

5 

15 

75 

300 

3 

4 

12 

48 

275 

3 

3 

9 

27 

250 

3 

2 

6 

12 

225 

8 

1 

8 

8 

200 

10 

0 

80 


175 

13 

-1 

-13 

13 

150 

15 

-2 

-30 


125 

4 

-3 

-12 

36 

100 

8 

-4 

-32 

128 

75 

4 

-5 

-20 

100 

50 

3 

-6 

-18 


25 

1 

-7 

-7 


0 





N 

81 


-132 

966 


= 25-sJ^ - 642 ^ 

== 25 Vll 5138 
= 25(3 39) 

= 84 75 


tain point, is identical to that for calculating the mean ^ In 
the last column of the table, the products of the frequency and 
the square of the deviation of each interval are entered The 
deviations have been expressed from the assumed mean 212 5. 
The true mean is 196 45. It is, therefore, necessary to correct 
for the error introduced by this assumption. This erroi is 
—.642 of an interval. The procedure for making the correction 
is indicated by the following formula in which c designates the 
difference between the true and the assumed means 


1 See page 70 The deviations are taken from an assumed mean 
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In terms of raw, ungrouped measures, the calculation of the 
standard deviation may be accomplished as mdicated by the 
formula 



This procedure avoids the necessity of setting up the frequency 
distribution and is economical when an appropriate calculating 
machine is available For example, using a Monroe Calculating 
Machine, lock the “ 1 of the extreme left column of the key- 
board and shift the carriage to the left. Then punch, successively, 
at the right of the keyboard, the various values of Xj multi- 
plying each by itself. The final readings on the lower dial 
will be 'ZX at the left and ZX^ at the right The mean is found 
by dividing ZX by N The subtraction of may be accom- 
plished by subtracting the mean M times from the quotient 
obtained by dividing ZX^ by N, 

The median deviation, MdD, is the median of the deviations 
from the mean. The term probable error, PE, has the same mean- 
ing but should be used only when the distribution is one of errors. 
The median deviation of a distribution is customarily computed 
by multiplying the standard deviation (standard error) by the 
constant .6745. 

MdD - .67450- 

This procedure assumes a normal distribution or one which 
deviates from it very slightly 

Additional items of description when the distribution is not 
normal. When a frequency distribution does not approach the 
normal curve in shape, the central tendency and a measure of 
variability do not furnish an adequate description, especially 
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when the departure from the normal shape is marked If the 
measures on one side of the mean or median tend to be bunched 
and there is tailing out on the other, the distribution is said to 
be skewed. In such cases, it is desirable to obtain a measure 
of this abnormality or skewness In a normal distribution the 
mean and the median coincide In a skewed distribution they 
do not, and the extent to which these measures are separated 
affords an index of the degree of skewness. The following 
formula is one commonly used ^ 

a 

This formula indicates, in addition to the magnitude of the 
skewness, whether it is ^^positive” or ^^negative^^ in character 
A positively skewed curve is one which is drawn out further to 
the right than to the left, the mean is greater than the median 
in scale value, and the curve slopes steeply on the left, but 
gently on the right A negatively skewed curve is one m which 
the curve is drawn out more to the left, the median is greater 
than the mean, and the slope on the left is gentle but steep on 
the right. 

A distribution may tend to be symmetrical about the mean 
and hence approximate zero in skewness but exhibit other ab- 
normalities. If it tends to be rectangular m shape, the standard 
deviation will tend to be a misleading measure In such a case, 
it is desirable to calculate the kurtosis which is defined by the 
formula: ^ 


Ku - 


n:ex^ 


For a normal distribution, the kurtosis is zero 

Measures of skewness and of kurtosis are not often calculated, 


1 Dunlap, J. W , and Kurtz, A K. Handbook of Statishcal Nomographs, Tables, 
and Formulas, Yonkers-on-Hudson: World Book Company, 1932, p 112. Eight 
formulae are given 


2 This formula is written m other forms 
Sometimes the *'3 ” is omitted. 
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but unless the shape of a distribution is approximately normal, 
a description limited to a central tendency and a measure of 
variability is not complete 

Identification of normal distributions. The normal curve, 
which IS approximated in shape by the graphical representation 
of the frequency distribution of unselected groups of many 
types of educational data, is defined by the equation 


y = 


N - 

pz: e 

(rV2x 


2a2 


in which N is the total of the frequencies of the distribution 
and e the base of the Naperian logarithms A normal distri- 
bution is also obtained by expanding the binomial {v + 
and taking the coefficients of the terms of the resulting poly- 
nomial as the frequencies By making n large, the graphical 
representation of these frequencies will approach a smooth 
curve. 

A measure of skewness affords an indication of the degree of 
departure from the normal shape, but it should be regarded as 
only a crude index of the degree of normality A distribution 
may be symmetrical, the opposite of skewed, and not be normal. 
A more precise means of discovering whether or not a given 
distribution departs significantly from the normal shape is 
Pearson^s Chi-Square Test ^ The use of this test demonstrates 
whether or not the normal curve fits the observed data within 
the fluctuations of random sampling. 

1 Pearson, Karl “On the Probability that Two Independent Distributions of 
Frequency Are Really Samples from the Same Population,” Biometrika, 8 250- 
64, 1911 

Pearson, Karl “On a Brief Proof of the Fundamental Formula for Testing 
the Goodness of Fit of Frequency Distributions and on the Probable Error of 
P,” Philosophical Magazine^ 30 369, 1916 

Pearson, Karl “On the Test of Goodness of Fit,” Biometnka, 14. 186-91, 
1922 

Pearson, Karl “Further Note on the Test of Goodness of Fit,” Biometnkar 
14 418, 1922 

Explanation and illustration of the use of this test are given in 

Holzmger, K J Statistical Methods for Students m Education Boston Ginn 
and Company, 1928, pp 245-48 
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The relation between deviations from the mean and corre- 
sponding areas under the normal curve. The normal curve has 
a number of interesting properties/ but in educational research 
we are concerned mainly with the fraction of the total area 
that IS cut off when perpendiculars are erected at points on the 
base line. The area cut off in this way may be determined by 
the methods of integral calculus/ but the calculation is laborious 
and tables have been prepared from which the areas correspond- 
ing to various deviations from the mean may be read These 
tables vary in form and in the labels employed to designate the 
quantities given The abscissa distances (deviations from the 
mean) are usually given m terms of or as a unit and this fact 

cc 

is indicated by labeling the column — * A few authors express 

<T 

the abscissa distances m terms of PE The area given is usually 
that included between the mean (x == 0) and the designated 

value of - Several symbols have been used for designating this 

(X 

area In his text ® Holzinger, following the lead of Sheppard 
and Pearson, uses but in his volume of tables ^ he employs 

the expression “area from = 0 to given ~ ” In the Kelley- 

<T cr 

Wood Table ® I is used as the symbol Sometimes the area 

X 

given is that from the extreme left up to the point defined by — . 

<T 

This area is designated by }i(l + a) by Sheppard and Pearson. 
Kelley uses p The total area is usually taken as 1 000 but 
Garrett ® uses 10,000 The choice of the total area is immaterial 
when only the fraction of the total area is desired. 

^ For an account see Kelley, T L Statical Method New York The Mac- 
millan Company, 1923, Chapter V. 

^ See Rietz, H. L , et al Handbook of Mathematical Statistics Boston Hough- 
ton Mifflin Company, 1924, p 14 for the probability integral 
® Statistical Methods for Students in Education, p 211 

^ Holzinger, K. J. Statistical Tables for Students in Education and Psychology 
Chicago. University of Chicago Press, 1925, Tables XI and XII 
^ Kelley, op cit , Appendix C 

® Garrett, H E Statistics in Psychology and Education, New York Longmans 
Green and Company, 1926, p 91 
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The ratio of a section of the area to the total area under 
the normal curve expresses the probability that a datum (meas- 
ure) selected at random from the population represented by 
the distribution is within the section specified For example, 
if the section is that marked off by measuiing 6745cr in both 
directions from the mean, the ratio of this area to the total area 
is Hence, the chances are one to one that a datum selected 
at random will fall within this section If the section specified 
is that to the left of the point M — 6745cr, the corresponding 
ratio is 3^^ The chances are one to three, or one out of four, 
that a datum selected at random will fall wuthin this section 
Given a normal distribution, a measure of its variability 
(standard deviation or median deviation) and the mean, the 
probability that a datum selected at random from the popu- 
lation will fall within certain limits can be determined from 
the specifications of these limits These limits may be between 
two points defined by measuring specified distances from 
the mean or they may be designated as merely beyond, 
i e., to the right or to the left of a specified point If the 
probability is given, the hmits corresponding to it may be 
determined 

Most of our use of a table of the values of the probability 
integral is with three distributions (1) that formed by the 
values of a statistic computed from a large number of similar 
but independent random samples from a universe or large 
population, (2) that of the variable errors included in data, and 
(3) that of the errors of estimate when predictions have been 
made within a typical population The first distribution may 
also be thought of as the distribution of the differences between 
the calculated values of the statistic and the value of the 
statistic for the universe or larger population from which the 
samples have been taken. These differences may be regarded 
as the errors of the calculated values of the statistic when they 
are taken as the value of the statistic for the universe or larger 
population. Hence, the distribution may be referred to as one 
of errors The standard deviation of a distribution of one of 
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these types is frequently designated the standard error, and the 
median deviation is properly called the probable error 

None of these distributions is actually formed in typical 
statistical work, but assuming that they are normal, it is pos» 
sible under certain conditions to calculate their standard devia- 
tions Hence, either of the two determinations noted above may 
be easily accomplished with the aid of a table of the values of 
the probability integral The distribution of the values of a sta- 
tistic from similar but independent random samples or the distri- 
bution of the corresponding errors will be referred to later in the 
present chapter The distribution of variable errors m data is 
dealt with under the head of the probable error of measurement 
in Chapter V Errors of estimate are considered in Chapter X. 

C. Calculation of Comparable Measures 

Calculating comparable measures. Two sets of measures 
considered to describe magnitudes of the same thing ^ are com- 
parable provided zero on one scale of measures is equivalent 
to zero on the second and the units of the two scales are such 
that n units on one may be considered to designate a magnitude 
equivalent to that represented by n units on the other. When 
these conditions do not prevail, a prerequisite for comparisons 
is the reduction of one set of measures to the basis of the other 
or the reduction of both to a common basis 

The procedures most commonly employed for effecting a re- 
duction are based on the assumptions that the means of two or 
more groups of measures of the same thing, or certain points 
measured from them, qualify as a common zero point, that the 
standard deviations of the distributions may be regarded as 
designating equivalent magnitudes, and that the ratio of cor- 
responding deviations from the mean is the same as that given 

1 The meaning of “same thing** should be noted From one point of view a 
silent reading test and an arithmetic test do not measure the same thing One 
measures what we call silent reading ability, and the other arithmetical ability. 
Both, however, measure achievement and from this point of view scores yielded 
by them may be considered to be measures of the same thing Similarly, meas- 
ures of height and measures of weight may be considered measures of physical 
maturity 
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by the standard deviations ^ Employing symbols, these as- 
sumptions are expressed by 

Zi - Ml ^ X2 - Mi ^ Xs - M3 ^ 

0*1 0*2 0's 

These expressions designate deviation measures expressed in 
terms of the standard deviation as the unit ^ Such measures 
are designated as ‘^standard’’ and represented by the symbol z 
Since the standard deviation is a relatively large unit, decimals 
will be required in expressing precise standard measures and 
in the case of those less than the mean, negative numbers will 
be necessary In order to avoid these rather inconvenient fea- 
tures, a transformation is usually made to an arbitrary scale 
which has a convenient mean and whose unit is a fraction of 
the standard deviation. A convenient scale is one whose mean 
is 50 and whose standard deviation is 14 ® Inserting these 
values and using Z' to designate the transmuted scoies, we have 
Z( - 50 ^ Xi-Mi 
14 ai 

Xi = 50 + -(Xi - Ml) 

0*1 

Similar formulae may be obtained for other sets of measures 
merely by changing the subscripts 

The T-scores proposed by McCall ^ are based upon this prin- 
ciple. He chose a scale of 100 units with the mean at 60 and 
with a standard deviation of 10 for converting the scores of an 
unselected group of twelve-year-old pupils. Worlton ^ has de- 

1 For an illustration in winch the median and quartile deviation are used, see 
Heilman, J D “The Translation of Scores into Grades,” Journal of Educatwnol 
Psychology, 24 241-56, April, 1933 

2 This procedure was proposed by Woodworth in 1912 

Woodworth, E S “Combining the Eesults of Several Testa, A Study m 
Statistical Method,” Psychological Reoiew, 19 97-123, March, 1912 

3 See Hull, CL “ The Conversion of Test Scores into Series Which Have Any 
Assigned Mean and Degree of Dispersion,” Journal of Applied Psychology, 
6 298-300, September, 1922 

4 McCall, W. A. Bow to Measure in Education New York The Macmillan 
Company, 1922, pp 272-306 

® Worlton, J. T “The Sigma Index Score as a Standard Measuring Unit,” 
Elemetdary School Journal, 30 354-62, January, 1930 
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scribed the sigma index score which is based on a scale with the 
mean at 100 and with a standard deviation of 20 
Some workers have transformed scores into percentile ranks 
The percentile rank of a measure is the per cent of cases in the 
distribution below the given measure. Its approximate value is 
that of the nearest percentile point and may be obtained by 
calculating this point Precise values are given by the formula. 

^ 100[/(Z -T)+ Sii] 

Ni 

Where Rx is the percentile rank of score X, f is the frequency 
of the interval in which X occurs, I is the lower limit of this 
interval, Si is the number of cases below this lower limit, ^ is 
the number of scale units in the interval, and N is the total 
number of cases ^ Usually, the approximate determination 
mentioned above is sufficiently precise ^ If the fifth, tenth, 
fifteenth, etc , peicentile points are calculated, the percentile 
ranks for intermediate scores may be found by interpolation 
Percentile values may also be obtained graphically from the 
cumulative frequency curve or from the ogive or percentile curve. 
These techniques, however, do not give highly precise results 
The unit divisions of the percentile scale are not equal Those 
at the extremes are larger than those near the center of the dis- 
tribution. Also, irregularities in the shape of the distribution 
tend to produce variations in the scale Since distributions of 
raw scores usually present some abnormalities, due to imper- 
fections m the measuring instrument or to selection of the group 
tested, it has been proposed that the distributions of raw scores 
be transformed into the normal shape ^ The method, however, 

^ For a distinction between “standard percentile ranks” and “centile ranks” 
see Rogers, DC “ An Argument for Centile Ranks,” Journal of Educational 
Psychology, 24* 107-17, February, 1933 
2 Holzmger, op cit , p 138 

8 For a method of machine calculation for percentile ranks, see Thurstone, 
L L “Note on the Calculation of Percentile Ranks,” Journal of Educational 
Psychology, 18 617-20, December, 1927 

Horst, Paul “A Method for Transforming Any Ummodal Frequency Dis- 
tribution into a Normal Distribution,” Journal of Educational Psychology, 
24 129-39, February, 1933 {Continued next page) 
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should not be applied when the number of cases is less than 
three or four hundred and when its use is appropriate, the labor 
involved will not often be justified See Chapter VIII for further 
discussion of percentile ranks and points. 

Measures expressed as age scores are considered comparable. 
Intelligence quotients are also generally considered comparable, 
but Miller ^ and Kefauver ^ have shown that they are not. The 
latter gives a table from which equivalent values for certain 
tests may be read 

Sometimes comparable measures may be secured by calculat- 
ing ratios For example, if an investigator desires to compare 
pupils with reference to the errors made in a set of compositions 
that vary in length, the number of errors per 100 words may 
be calculated. If it is desired to compare a number of schools 
with reference to increase in enrollment, ratios commonly ex- 
pressed as per cents are usually calculated as a means of mak- 
ing a comparison This method is sound if the several bases 
are expressed from absolute zero points. This would be true 
in the case of the two illustrations just noted. If gains in test 
scores are being compared, ratios may be used, provided they 
are labeled “per cent of increase in test scores ” If the ratios 
are labeled or interpreted as “per cent of increase in achieve- 
ment,^' they probably are not comparable because the measures 
of achievement are not expressed from absolute zero points 

D. Statistics of Relationship 

Techniques for studying the relationship between two sets of 
paired measures. Two sets of measures, such as the chronolog- 
ical ages and intelligence quotients of a group of students or 
average marks in high school and average marks in college for 

Horst, Paul “Comparable Scores from Skewed Distributions,” Journal of 
Experimental Psychology, 15 465-68, August, 1932 

Horst, Paul “A Routine Procedure for Obtaimng Comparable Scores,” 
Journal of Applied Psychology, 16 324-30, June, 1932 

^ Miller, W S “The Variation and Significance of Intelligence Quotients 
Obtained from Group Tests,” Journal of Educational Psychology, 15.»359-66, 
September, 1924 

2 Kefauver, G N “Need of Equating Intelligence Quotients Obtained from 
Group Tests,” Journal of Educaiional Research, 19 92-101, February, 1929 
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the same students are said to be paired because there are two 
measures for each individual The degree of relationship refers 
to the degree to which the larger measures in one set are paired 
with the larger in the other and the smaller in the one are paired 
with the smaller in the other, or the degree to which the smaller 
measures in one set are paired with the larger measures and 
the larger in the one are paired with the smaller in the other 
The relationship in the first type of pairing is designated as 
positive and that m the second as negative or inverse. 

The degree of relationship between two sets of paired data 
may be studied in several ways. One of the simplest is to write 
the measures m parallel columns placing the paired items oppo- 
site each other. Such an arrangement will be more meaningful 
if the measures in one column are arranged in order of magni- 
tude This procedure is crude and when the number of pairs 
is large, it is more helpful to make a tabulation, as shown m 
Table IV. Such a correlation table may be partially summarized 
by calculating the central tendency for each of the columns or 
each of the rows. The central tendencies thus obtained may 
be represented graphically as ordinates using the scale divisions 
of the other dimension of the table as abscissas The line that 
'^best fits^' the several points thus located, summarizes the 
relationship between the two sets of paired data.^ One may 
also prepare a scatter diagram by locating the points correspond- 
ing to the several pairs of measures For some purposes a graph- 
ical representation of the averages of the several columns or 
rows, or a scatter diagram affords a very helpful means of 
studying relationship, but usually a coefficient of correlation 
should be calculated as an index of the degree of relation- 
ship. 

The calculation of the product-moment coefficient of correla- 
tion.^ The product-moment coefficient of correlation developed 
by Pearson is the most widely used numerical index of relation- 

1 An illustration of this type of representation is given m Chapter X 

2 The critical reader will be interested in the exposition of the assumptions 
underlying this technique. See pages 101-03 
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Table IV. Illustrating a Correlation Table 


First set of measures 
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1 

6 

7 

5 

9 

11 

10 

6 

9 

6 

4 

1 

1 

76 
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7 5 

19 2 

23 9 

31 5 

31 9 

33 9 

43 0 

45 0 

47 5 

o 

o 

58 8 

67 5 

67 5 



ship It may be thought of as being defined ^ by the formula 

0*1 (J% 

in which Xi and 0:2 represent the two sets of paired measures, 
each measure being expressed as a deviation from the mean of 
its distribution, ai and 0*2 represent the standard deviations 
of the two distributions, and N designates the number of pairs 

of data, or the number of cases. The fraction — is the ratio of a 

0*1 

measure to the standard deviation of its distribution In other 

1 This definition is offered as a convenient approach to the study of correla- 
tion Another approach is to define the coefficient of correlation as the constant 
r in the linear regression equation See page 100 It may also be defined as one 
of the essential indices of the bivariate normal surface For a brief historical 
account of the development of correlation and references to original sources, see 
Walker, H M Studies in the History of Statistical Method Baltimore The 
Williams and Wilkins Company, 1929, Chapter V The reader interested in 
making intensive study of the product-moment coefficient of correlation will 
find the following reference and the bibliography it includes helpful Furfey, 
P H , and Daly, J F “The Interpretation of the Product-Moment Correlation 
Coefficient,” Catholic University of America Educational Research Monographs^ 
Vol. 8, No 4 Washington, D C Catholic University Press, 1934 57 pp 
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words, this fraction represents the measure expressed m terms 
of the standard deviation as a unit Hence, the numerator of 
the formula represents the sum of the products of the paired 
measures when each is expressed in terms of the standard 
deviation of its distribution as a unit The formula may be 
written 

'ZZiZ2 

“ N 


If the measures are expressed as deviations from assumed 
means, as is usually the case when the calculations are made 
from a correlation table, the following formula ^ indicates the 
plan of calculation. 


rn = 


'Sx'iXi 

~ir 


— C1C2 



N 


2 


Sx'iX^ 

N 


— C 1 C 2 


<Ti<T2 

1 Several other forms of the formula may be written See Dunlap, J W , and 
ICurtz, A K Handbook of Stahshcal Nomographs, Tables, and Formulas Yonkers- 
on-Hudson World Book Company, 1932, pp 117-19 Fifty-two forms are given 
by Symonds, P M. “Variations of the Product Moment (Pearson) Coefl&cient of 
Correlation,” Journal of Educational Psychology, 17 458-69, October, 1926 
A formula developed by Pearson for calculating the product-moment coeffi- 
cient utilizing the means of the columns or of the rows 13 given by Holzinger, 
K J Statistical Methods for Students in Education Boston Gmn and Company, 
1928, pp 258-60 This formula may be expressed as 

^ ^ Zf^xKM^ - M 2 ) 

Na-t<72 

or 

. _ - Ml) 

7*12 ~ 

Wo-iO-2 

In the first formula M% refers to the means of the columns and iW 2 to the mean 
of the total distribution of X 2 , assuming that X 2 is distributed along the vertical 
axis of the correlation table as in Figure 1 The numerator is then the sum of the 
products of the differences and the corresponding totals of the columns and de- 
viations of the columns from the assumed mean of Xi Corresponding statements 
apply to the second formula When emplosnng either of these formulae, all quan- 
tities should be expressed in terms of scale units rather than intervals. 
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The application of this formula to a correlation table is 
illustrated in Figure 1 Heavy horizontal and vertical lines 
have been drawn to mark oiff the scale interval in each distn- 
bution within which the mean is assumed to he. The correlation 
table IS extended at the right and at the bottom to provide for 
the following quantities: deviations from the assumed means 
designated by the symbols and products of each devia- 
tion by its frequency expressed by the symbols axidfixi, the 
products of each frequency and the square of its deviation ex- 
pressed by the symbols and From the algebraic 

sums of the appropriate ones of these columns and rows, the 
standard deviations cti and cr 2 are calculated ^ according to the 
method illustrated on page 76 Two additional columns are 
given at the right of the table. The values under the symbol 
Xxl are obtained by multiplying the measures m each horizon- 
tal row or array by their corresponding x[ deviations For 
example, 1X4 = 4 and (1 X 2) + (2 X 5) + (1 X 7) = 19. 
The lowest horizontal row (to the right of 30) yields (1 X “5) 
+ (1 X “3) = — 8. The values under the symbol l^x[x[ are 
obtained by multiplying 'Zx'-^ values by their corresponding 
X 2 deviations. For example, 4 X 7 = 28 and 19 X 6 = 114. 
The horizontal rows following the symbols 'Zx^ and pro- 
vide checks for the sums of the columns labeled /gXg and ^x[x 2 . 
The values in these rows are obtained similarly to those in 
columns and 'Iix[x 2 For example, (IX —6) + (1 X —8) 
= — -14 which multiplied by —5 equals +70 The sum of the 
column should also be the same as the sum of the horizon- 
tal row fx[ Thus, in the illustration given, the values 43, 
—348, and 2333, later used in calculation, are each obtained 
twice.^ Furthermore, the value for N, or 568, may be checked 

^ It should be noted that the standard deviations are expressed in class inter- 
vals as units, rather than m scale units For purposes other than the calculatzon 
of ri 2 , the standard deviations should be expressed in scale units In the case of 
the correlation table of Figure 1 each of the values should be multiplied by 
5 0, the width of the class or scale interval 

2 Through a mere coincidence, the sum of the negative values in the Sri 
column equals —348 The checks mentioned refer to the fixi row and the Srj 
column (43), the Sr 2 row and the fzx'i column (—348), and the row and 

column (2333). No check is given for the/i(a;i)* row and the/ 2 (a? 2 )^ column. 
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by totaling the column and the /i row. The student should 
take advantage of these checks and should not commence the 
later steps in the calculation of the coefficient of correlation 
until he has made them 

The calculation of Ci, C 2 , cti, and requires no explanation. 
The sign of cl and c\ is always positive The sign of cic^, how- 
ever, should be carefully noted In the illustration it is nega- 
tive, since Cl and C 2 are unlike in sign. The reader should ob- 
serve that calculations were carried to the fourth decimal place. 
This facilitates rounding off correctly to two places Reporting 
a coefficient to more than two decimal places lends a false air 
of accuracy which should be avoided ^ 

Economy in calculating a coefficient of correlation. The con- 
struction of a correlation table and the calculation of the co- 
efficient from it is a labouous process, especially when the 
number of cases is large Several statistical workers have de- 
vised blank forms to facilitate and routinize the process ^ The 
use of a wisely planned coi relation chart not only results in 
considerable saving of time, but it also adds to the accuracy of 
the computations by systematizing the work Greater econ- 
omics, however, may be effected m other ways 
The tabulation of the data m a correlation table is not neces- 
sary and the elimination of this step may result in considerable 
saving of time, especially when the intercorrclations between 

1 See page 62 

2 Holzmger, K J Statistical Methods for Students of Education Boston Gmn 
and Company, 1928, p 155 

Justice, W A “ Correlation Sheet ” Cincinnati C A Gregory Company 
Kelley, T, L “Kelley Correlation Chart ” Yonkers-on-Hudson, New York 
World Book Company (A copy is pasted on the inside of the back cover of 
Kelley's Interpretation of Educational Measurements ) 

Lauer, A R “Simplex Correlation Form,” Minneapolis The Educational 
Test Bureau. 

Otis, A S “The Otis Correlation Chart,” Journal of Educational Research, 
8 440-48, December, 1923 

Otis, A S Statistical Method in Educational Measurement Yonkers-on- 
Hudson, New York World Book Company, 1925, p 195 

Thurstone, L L “A Data Sheet for the Pearson Correlation Coefficient,” 
Journal of Educational Research, 6 49-56, June, 1922 

Toops, HA “A Printed Form for Computing the Standard Deviation on the 
Adding Machine,” Journal of Educational Research, 12 56-58, June, 1925 
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several paired variables are desired Walker ^ has proposed a 
procedure, the first step of which is to assemble the values of 
each variable in a frequency distnbution and then to calculate 
the standard deviations The paired measures are written in 
parallel columns with space between the columns for writing 
in the corresponding deviations in terms of intervals from the 
assumed means The products of the paired deviations are then 
written in two columns, one for the positive values and one for 
the negative The algebraic sum of the totals of these columns 
will be the term The corrections obtained in calculating 

the standard deviations are those required for completing the 
formula 


1^X[X2 

~N~ 


— C1C2 


ri2 = 


cri(r2 


When a suitable calculating machine is available, another 


form of the formula affords a convenient procedure If 


and are substituted for ai and (r 2 , we obtain 


'Ex 1X2 


Also Xi == Xi — Ml and 0:2 = Z2 — ibr2 Substituting these 
equivalents 

^ - Mi){X 2 - M2) 

VS(Zi - 

^ -2X1X2 - ZZ1M2 - SZaMi + 

VSZf - 22X1M1 + SMfVSZi - 22X2M2 + 2 Mi 

EX 

M2 being a constant, SZ1M2 == JkfaSZi. Since Mi = 

2 Zi = NMi. Hence, M2SZ1 = MsCiVMi). SZ2M1 = MiSZa 

1 Walker, J F “Short Method for Finding Zero Order Coefficients of Cor- 
relation, “ Journal of EducaUonal Psychology^ 21 65-67, January, 1930 
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= Mi{NM^ Since M i and M2 are constants, 'EM iM % is equal to 
NM1M2, SMi is equal to NM\ and 1 iM\ is equal to NMl. 
Similarly, SZiMi = NM\ and SX2M2 = NMl Substituting 
these eqmvalents, the equation simplifies to the form 

SX1Z2 - NMxM 2 

ri2 = - — — : :: : :: ' " : — 

VDXf - iVMf VSZl - NMl 

The calculations indicated by this formula may be made 
from a correlation table/ but unless N is very large, it is more 
economical to make them directly from the data records,^ that 
is, from the series of paired measures of Xi and X2 However, 
if the measures are expressed in terms of large numbers, the 
calculations will be cumbersome, even when machines are 
available for making them. The labor involved can be reduced 
by transmuting the measures into smaller numbers ^ The 
smallest measure of each series may be subtracted from all the 
other measures in that scries, i.e., the smallest value of Xi may 
be subtracted from all the values of Xi, and the smallest value 
of X2 may be subtracted from all the values of X2. It is not 
essential in using this formula that the measures m either scries 
be arranged in order of magnitude. It is only essential that 
the measures in the original or m the reduced series remain 
correctly paired. Another plan is to form a scale of intervals 


^ I f standard de viatio ns are desired in addition to ri 2 , divide numerical values 
of VsXi — NM\ and VsXl — NMlhy N before extracting square loots 

For a description of the technique, see Ackerson, Luton A Pearson-r Form 
for Use with Calculating Machines," Journal of Educational Psychology, 19 58- 
60, January, 1928 

2 For an illustration see Ayres, L P “ Shorter Method for Computing the 
Coefficient of Correlation," Journal of Educational Research, 1 216-21, March, 
1920 

For detailed instructions for using a Monroe Calculating Machine with a 
minor modification of this formula, see Tremmel, E E , and Weidemann, C C. 
“A Machine Method of Calculating the Pearson Correlation Coefficient," 
University of Nebraska Publication, No 72 Lincoln, Nebraska Extension 
Division, University of Nebraska, June, 1930 15 pp 

® Ayres, L P " Substituting Small Numbers for Large Ones in the Computa- 
tion of Coefficients of Correlation," Journal of Educational Research, 2 502-04, 
June, 1920 

Toops, Herbert A. "Computing Intercorrelations of Tests on the Adding 
Machine,” Journal of Applied Psychology, 6 172-84, June, 1922. 
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such as would be used if the data were to be tabulated in a 
correlation table. The first interval may be given a value of 1 , 
the second a value of 2 , and so on. These values may be sub- 
stituted for the measures in the respective mtervals. The proc- 
ess is called coding. 

When making calculations from untabulated measures, one 
should be systematic. It is helpful to list the measures, their 
squares, and their products under the following heads SXi, 
SX2, SXi, SX2, and SXiX2, with the totals appearing at the 
bottoms of the columns. The means Mi and are obtained 
by dividing SXi and SX2 by N. Where several coefficients 
are to be calculated from a number of series referring to the 
same individuals, the table may be extended to include the 
following heads: SXi, 2X2, SX3 . . SX,,; SX?, SXl, SX§, . . . 
SX^, 2X1X2, 2X1X3, 2X2X3 . . . 2 X«_,X,,. This procedure 
eliminates duplication of calculation. 

The term 2X1X2 is inconvenient to compute, even with the aid 
of a calculating machine, but it may be eliminated by a furthei 
transformation of the formula for the coefficient of correlation.^ 


ri 2 


(Tl ■ 0-2 

2(72 ^ 2ai 


2D2 

N 


~ {Ml ~ M^y 


2 ( 71(72 


This formula, in which D designates the difference between the 
measures of a pair appears formidable, but examination of it 
will reveal that the calculations called for are simple. If the 
standard deviations and the means are desired for other pur- 
poses, the calculation of the coefficient of correlation by means 

1 Huffaker, CL “A Note on Statistical Methods,” Jowrml of Educational 
Psychology^ 16 265-66, April, 1925 

Orleans, J. S “Correlation without Plotting,” Journal of Educational Psy- 
chology, 18* 310-17, May, 1927 

Cureton, EE “ Computation of Correlation Coefficients, ’ Journal of Edu- 
cational Psychology, 20 588-601, November, 1929 

For a plotting and checking technique based upon a modification of this 
formula, see 

Anderson, L D , and Toops, HA “A New Apparatus for Plotting and a 
Checking Method for Solving Large Numbers of Intercorrelations,” Journal of 
Educcdional Psychology, 19 650—57, December, 1928, 20 36—43, January, 1929. 
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of this formula requires only the summation of the squares of 
the differences and the other indicated operations. If one set of 
the raw measures is transmuted so that 0*1 = 0*2 and Ifi = M 2 , 
the formula becomes 

ri2 1 2^2 

Feldstein ^ has described the technique for carrying out the 
operations required by this formula He reports that a rela- 
tively inexperienced statistical worker computed the forty-five 
intercorrelations for ten variables, N = 100, in a total of six 
hundred thirty-five minutes This is an average of about four- 
teen minutes per coefficient of correlation 

In some situations the calculation of the tetrachoric coeffi- 
cient of correlation may advantageously be substituted for 
the calculation of the product-moment coefficient of correla- 
tion For the calculation of a tetrachoric coefficient, the data 
are tabulated in a 2 X 2 table and under certain conditions it 
is equivalent to the product-moment coefficient of correlation.^ 

The computation of coefficients of correlation from tabulat- 
ing machine cards has been described by Mendenhall and 
Warien ® A machine of somewhat different type has been 
constructed by Hull ^ 

Although the techniques referred to in the preceding para- 
graphs are economical of time, the elimination of the correla- 
tion table from the process may be a handicap m interpreting 

1 Feldstein, M J *‘A New Technique for Machine Computation of Coef- 
ficients of Correlation,” Journal of Experimental Education^ 2 278-82, March, 
1934 

2 The calculation of tetrachoric coefficients of correlation is most readily ac- 
complished through the use of diagrams contained m the following publication 

Chesire, Leone, Saffir, Milton, and Thurstone, L L Computing Diagrams for 
the Tetrachoric Correlation Coefficient Chicago* University of Chicago Book- 
store, 1933 

2 Mendenhall, H M , and Warren, Richard ” Computing Statistical Coef- 
ficients from Punched Cards,” Journal of Educational Psychology, 21 53-62, 
January, 1930 

^Hull, CL “An Automatic Correlation Calculating Machine,” Journal of 
the American Statistical Association, 20 522-31, December, 1925 
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the coefficient of correlation as an index of the relationship 
between the two sets of paired measures. Furthermore, the 
correlation table is useful as a means of spotting probable cases 
of non-linear relationship to which the product-moment co- 
efficient IS not applicable Hence, the elimination of the corre- 
lation table IS not to be recommended, except when the saving 
of time IS an important consideration. In general, the investiga- 
tor who computes coefficients of correlation only occasionally, 
should follow the procedure illustrated on pages 90-91 or some 
minor modification of it. 

Error due to grouping in broad categories. In the calculation 
from the correlation table on page 90, the mid-point of an in- 
terval was taken as the average value of the measures in it. 
In a normal distribution the mean value of the measures in 
an interval is slightly nearer the mean of the distribution than 
the mid-point of the interval. Hence, the standard deviations 
in the denominator of the formula for the coefficient of correla- 
tion will be slightly too large and the value of r will be decreased 
Sheppard’s correction^ may be applied but if the correlation 
table IS not less than 12 X 12 or 10 X 10, the error is relatively 
small ^ For a table smaller than 10 X 10, the error due to broad 
categories or intervals is large enough to be a matter of con- 
siderable concern, especially when A is less than 50 When the 
coefficient of correlation is calculated between two sets of 
school marks or other measures expressed in terms of a small 
number of categories by the method described on pages 90-91, an 
investigator should recognize that the result is likely to involve 
a relatively large error In such cases the method of tetrachoric 
correlation or polychoric correlation may be used ^ 

^ See page 154 

2 For a 10 X 10 table, the error is about 4 per cent 

Camp, B H The Mathematical Part of Elementary Statistics, Boston D C 
Heath and Company, 1931, p 301 

See also Kelley, T L Statistical Method New York The Macmillan Com- 
pany, 1923, pp 167 f 

® Camp, op cit , pp 302 f 

See also Holzmger, K J. Statistical Methods for Students in Education Boston 
Ginn and Company, 1928, p 263. 
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Studying the relationship between two sets of paired measures 
when it is non-linear. The derivation of the formula for the 
product-moment coefficient of correlation is based on the 
assumption that the relationship is linear, which means that 
the line joining the points located in a graphical representation 
of the means of the columns or the rows approaches a straight 
line, rather than a section of a paiabola or some other curve. 
When the relationship is non-linear, the correlation ratio should 
be calculated. When the calculation is made from a correlation 
table, the following formulae indicate the procedure ^ 

(Correlation ratio for means of 
^2 arrays or rows, curvilinear cor- 
^ relation of xi on x^) 


(Correlation ratio for means of 
^2 columns, curvilinear correlation of 

^ X2 on a:i) 

In calculating the ratios of correlation by means of these 
formulae, two columns and two rows are added to the correla- 
tion table. (See page 90 ) The first of the columns is headed 
and its values are obtained by squaimg those appearing 
in the column. The second of the two new columns is 

headed - 7 — . Its values are obtained by dividing the values 

in the (Sx()^ column by the corresponding values m the 
column. The new horizontal rows, appearing at the bottom of 

the table, (Sx^)^ and , are obtained similarly. The final 

steps in the calculation of the ratios should be evident from the 
formulae.^ 

1 See page 252 for a different form of these formulae. 

2 The reader who desires additional information will find it helpful to con- 
sult* 

Odell, C. W. Educational Statistics New York The Century Company, 
1925, pp. 209 f. {Continued next page) 
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The coefficient of correlation may be thought of as measuring 
the deviations from the straight line which best fits the means of 
the columns or rows The correlation ratio may be thought of as 
measuring the deviations from the curved line which best fits 
the means of the rows or columns. It should be noted that for 
any tabulation there are two correlation ratios — one that relates 
to the curve best fitting the means of the columns and the other 
relating to the curve that best fits the means of the rows. 

The regression equation as an expression of relationship. 
On page 86 the suggestion was made that the relationship 
between two sets of paired measures might be summarized by 
representing means of the columns of the correlation table 
graphically as ordinates above the mid-points of the intervals 
of the horizontal scale of the table and then drawing the straight 
line that “best fits” the points thus located. For some purposes 
it is desirable to determine the equation of this line which will 
express the relation between the “average” value of the meas- 
ure^of one set associated with a given measure of the other set 
If Zi is* employed to represent these “average” values, the 
form of the equation will be 


= mZa + C 

In general the corresponding values of Xi and Xi will not be 
equal and the “best fitting” line is defined as the one for which 
S(Xi — IS a minimum When the relationship is linear, 
the equation of this line, called the regression equation,'^ is 

Holzinger, K J Statistical Methods for Students in Education Boston: 
Ginn and Company, 1928, Chapter X. 

A simplified method of calculation has been reported by Dvorak Dvorak, 
August “A Simplified Computation of Non-Linear Correlation,” Journal of 
Educational Research, 25 99-104, February, 1932 A chart to facilitate this 
simplified method of calculation is published by Longmans, Green and Company, 
New York 

1 For an explanation of the derivation of this equation see Holzinger, K. J. 
Statistical Methods for Students in Education Boston Ginn and Company, 
1928, pp 158 f 

Use of the word regression to designate this equation is due to Francis Galton 
who studied the inheritance of traits and found that offspring tend to resemble 
the mid-parent, “average of the measures of father and mother ” This tendency 
was called regression When the relationship was expressed in equation form, 
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= ri2-'X2 - + Ml 


This equation may be written in the following forms 

= ri2 

(Tl (72 


Xi--Mi = rx2-'(X2-M2) 

<72 

The relation between the “ average values of X 2 and the 
corresponding values of Xi is given by the equation 

X, = ri2-*Zi - r,2-'Mi + M^ 

CTi CTi 

The use of the regression equation as a formula for prediction 
IS dealt with in Chapter X Multiple correlation and multiple 
regression are also considered therein 

Techniques for the study of relationships involving other 
types of data.^ In the preceding descriptions, both sets of data 
have been expressed in quantitative terms One or both sets 
of the data may be rank orders or categorical classifications. 
If a normal distribution is assumed, ranks may be transformed 
into quantitative measures and the product-moment coefficient 
of correlation or the correlation ratio is then an appropriate 

the term regression was used to identify it For a brief account of Galton’s work 
see Kelley, T L Statistical Method New York The Macmillan Company, 1923, 
pp 152 f 

1 The limitations of space prohibit the description of these techniques The 
interested reader may consult the following references 

Kelley, T L Statistical Method in Education New York The Macmillan Com- 
pany, 1923, pp 231^78 

liolzmger, K J Statistical Methods in Education Boston Ginn and Com- 
pany, 1928, pp 231-78 

Walker, Helen M Studies in the History of Stalistical Method Baltimore 
The Williams and Wilkins Company, 1929, pp 125-41 This reference consists 
mainly of an annotated bibliography 

These references describe certain techniques in addition to those mentioned 
here Dunlap, J W , and Kurtz, A K Handbook of Statistical Nomographs, 
Tables, and Formulas Yonkers-on-Hudson, New York World Book Com- 
pany, 1932, pp 122-25 may be consulted for formulae 
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technique When both sets of data are in terms of ranks, a 
measure of the relationship may be calculated by the formula 

P - 1 ) 

in which d represents the difference in the rank of the two 
measures of a pair Measures in quantitative terms may be 
transformed into ranks and this formula applied Values of 
p (rho) differ slightly from the corresponding values of r ^ 

If one set of data is in terms of quantitative measures and 
the other is in terms of categorical classifications such as grade 
levels, geographical areas, types of school organization, and the 
like, the correlation ratio may be calculated by means of a 
formula given m Chapter VIII, page 252 If there are only two 
categories (dichotomous classification), h-senal r is used as the 
measure of relationship See page 236 for formula. 

If one set of data consists of ranks and the other of categorical 
classifications, the ranks may be transformed into quantitative 
measures if the assumption of a normal distribution appears 
justified. When this is done, the coirelation ratio may be cal- 
culated. If both sets of data are categorical classifications, the 
coefficient of contingency gives a measure of the relationship. 
If both of the classifications are dichotomous, the tetrachonc 
coefficient of correlation may be calculated. 

Assumptions relating to the calculation of the coefficient of 
correlation.^ The assumptions relating to the product-moment 
coefficient of correlation can best be understood if one thinks of 
the usual correlation table The only assumption made in the 
derivation of the formula is that the distribution of the frequen- 
cies in this table exhibit a linear relationship rather than one that 

TT 

1 The relationship is r=2 sm -p 

D 

Kelley, op cit , p 193, gives a table of the corresponding values 

2 For a report of a study of the extent to which assumptions are ignored in 
practice, see 

Furfey, P H , and Daly, J F “ Product-Moment Correlation as a Besearch 
Technique in Education,” Journal of Educational Psychology, 26 206-11, 
March, 1935 
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is curvilinear. Cases of marked non-linearity may be identified 
from an inspection of the correlation table, but for precise deter- 
minations the Blakeman test^ should be applied and if a satis- 
factory degree of linearity is not shown, the product-moment 
coefficient should not be used. 

When a coefficient of correlation is calculated from a correla- 
tion table, the assumption is made that the measures falling 
within the several cells of the table are uniformly distributed 
within the areas This assumption is not completely satisfied 
when the distributions are normal, and frequently the departure 
from it IS much greater, especially when N is not large The 
effect of non-conformity with this assumption is indicated by the 
fact that different choices of intervals for the correlation table 
may yield coefficients varying as much as 10 or more.^ Shep- 
pard^s correction for coarse grouping will usually give a more 
correct coefficient'"^ but unless N is greatei than 100, a better 
procedure is to make the calculations directly from the data by 
means of the formula on page 94 

In our use of the standard error of estimate^ two additional 
assumptions arc introduced The first of these requirements is 
homoscedasticity, which means that the variabilities (standard 
deviations) of the several arrays (columns and rows) of the cor- 

1 The Blakoman tost for Imeanty is based on tho difference between the cor- 
relation ratio iv) and the coefficient of correlation (r). The expression to be 
evaluated may be written 

When the relationship is perfectly linear, the value of this expression is zero, 
and as the relationship becomes curvilinear, the value of the expression indicates 
the degree of departure from linearity Although there is not complete agree- 
ment between authorities, the product-moment coefficient probably can be 
safely used when 

r2) < 16 38 

A more conservative limit is 11 37 and it is wise to use this limit when N is not 
large For a longer form of Blakeman’s test see Holzinger, op cit , p 183 

^Lauer, A E. “An Empirical Study of the Effects of Gioupmg Data and 
Calculation of r by the Pearson Product-Moment Method,” Journal of Applied 
Psychology, 14, 182-89, April, 1930 This reference gives the results of comput- 
ing coefficients of correlation from different groupings of the same data It 
includes an excellent summary statement of the factors which may affect the 
coefficient of correlation calculated from a given group of data 

® See pages 153 f ^ See page 333. 
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relation table are equal. The second is that the correlation sur- 
face ^ IS normal. Although neither of these requirements is 
involved in the calculation of r, their introduction in our use of 
the standard error of estimate has associated them with the 
coefficient of correlation, and for practical purposes it is wise 
to regard them as implied assumptions which should be approxi- 
mated A coefiicient of correlation is thought of as an index 
descriptive of the correlation table If the correlation surface 
is not approximately homoscedastic and normal, the coefficient 
is likely to be misleading. In an abnormal correlation table a few 
pairs of measures may affect the coeflScient of correlation to such 
an extent that their omission would materially change the cal- 
culated value The situation is analogous to the use of the mean 
as a measure of the central tendency of a distribution in whiclia 
few extreme measures cause the difference between the mean 
and the median to be relatively large. 

E. Measures of the Effect of Chance 
IN Random Sampling 

The probable limits of the value of a statistic for a universe^ 
when calculations have been made from a random sample. An 

investigator necessarily works with a limited collection or sample 
of data, but frequently he is interested in the value of certain 
statistics for a larger population or universe. In other words, 
he wishes to generalize from his findings If the sample is a 
random one,® it is possible to determine the probable limits of 
the value of the statistic for the universe from which the sample 
was taken or is considered to be taken 

1 The term “correlation surface” implies representation in three dimensions, 
the frequencies in the correlation table being represented as vertical distances 
The correlation surface is normal when all of the arrays, both columns and rows, 
form normal distributions 

2 A universe is defined as an infinitely large population The formulae given 
in the following paragraphs are based on the assumption of a random sample 
from a universe They, however, may be used with random samples from large 
finite populations In such cases the calculated value of the standard or probable 
error will tend to be slightly too large This condition is not undesirable because 
it merely makes the probable limits slightly conservative See footnote, page 108. 

® See pages 57 f. and 155 f for discussion of random samphng. 
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If a senes of similar, but independent random samples, are 
taken from a universe, the values of a statistic, such as the mean, 
will not be identical, but will tend to form a normal distribution 
whose mean approaches the value of the statistic for the uni- 
versed The standard deviation of the distribution of the means 
of an infinite number of similar but independent random samples 
is given by the formula. ^ 

(T 

Af = .tjrf 

The median deviation, commonly designated as the probable 
error (PE), is usually a more convenient measure of the van- 
ability of tins distribution. The formula is 

PEm = 6746-^ 



1 For an illustration using a large finite population see Chaddock, R 
Principles and Methods of StatiUzcs Boston Houghton Mifflin Company, 
1925. pp 232 f 

2 When the data are fallible, i e , involve variable errors of measurement, 
this formula gives a measure of the combined effect of sampling and of such 
errors See page 157 for formulae for the effect of sampling alone when the data are 
fallible 

® When a coefficient of correlation has been corrected for attenuation (see 
page 151) the correct formula varies with the one used to secure correction for at- 
tenuation See Kelley, T L Statistical Method New York The Macmillan 
Company, 1923, pp 210 f 
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The probable error of a proportion or per cent (p) is ^ 

PE^= 6745^ 
in which q is defined as 1 — p 

The probable error of the difference between means of two 
samples is found by the formula ^ 

pEz>i_2 = 

If the two sets of measures are not correlated, i e., if ria = 0, 
the formula simplifies to 


= A/PEln + PEix 

The probable error of the difference of two proportions is 
similar 

The values of a statistic calculated from similar but inde- 
pendent random samples form a normal distribution and hence 
the relationship between abscissa distances and corresponding 
areas under the normal curve may be applied. See page 80. 
Fifty per cent of these values of a statistic would be within 
l.OOPE of the mean of the distribution (the value of the statis- 
tic for the universe). Hence, if the value calculated from a single 
random sample is taken as an estimate of the value of the statis- 
tic for the universe, the chances are just even (50 to 50) that the 
value of the statistic for the universe will be within =±=1 OOFE 
of the estimate These limits are commonly expressed by con- 
necting the calculated value with its probable error by a plus or 
minus sign (±). As a means of distinguishing between the 

1 This IS the formula usually given A simple random sampling is assumed. 
For modified random selection, the formula is different See Mudgett, B D., 
and Gevorkiantz, S R. “ Reliability of Forest Surveys,” Journal of the American 
Statistical Assocmtion, 29 257—81, September, 1934 

2 This IS the formula usually given In the derivation the coefficient of cor- 
relation in the product term appears as but if the assumptions of random 

sampling are met, it is equivalent to rn For proof see Walker, Helen M. 
“A Note on the Correlation of Averages,” Journal of Educational Psychology, 
19 636-42, December, 1928 
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statistic for the sample and that of the universe, ^ (read curP) 
may be written over the usual symbol For example, if the mean 
of a random sample is 57 00 and its probable error is 1.50, we 
would write ikf == 57 00 and Af = 57 00 =*= 1.50. The expression 
57.00 =>= 1 50 IS not to be interpreted literally but rather as a 
statement of the probable limits of the mean of the universe 
By means of a table of the values of the probability integral, 
described on page 80, the probabilities for other limits may 
be determined. The probabilities corresponding to the limits 
defined by various multiples of the probable error ^ are given m 
Table V 


Table V The Probabilities That the Value of a Statistic for 
A Universe Lies within the Interval Formed by Subtracting 
and Adding a Multiple of Its Probable Error 


Probability 

Interval 

1 to 1 

statistic — PE to statistic + PE 

4 G to 1 

statistic — 2PJ? to statistic +2PP 

22 to 1 

statistic —SPE to statistic +ZPE 

142 to 1 

statistic —4PE to statistic •p4PE 

1340 to 1 

statistic SPE to statistic -}-5PP 


The meaning of a difference ® depends upon its sign Hence, 
an investigator is interested in ascertaining the probabilities 
that the difference for the universe has the same sign as the 
obtained difference When computations have been made from 
random samples, the ratio of a difference to its probable error 
has been proposed as a convenient statistic. For a ratio of 
1 00 the chances of the difference for the universe having the 
same sign are 75 m 100, for a ratio of 1.60, 84 in 100; for a 
ratio of 2 00, 91 in 100, for a ratio of 3.00, 98 in 100; for a ratio 
of 4.00, 997 in 1000 The theoretical probabilities for various 
values of this ratio, commonly called critical ratio (CB), have 

1 This symbol is not in general use but the need for it is obvious. It is used in 
Camp, B H. The Mathematical Part of Elementary Statnytics, Boston D C 
Heath and Company, 1931, p 241 

2 The standard error might be used, but it is less convenient 

® A similar statement may be made with reference to the coefficient of cor- 
relation See page 117 
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been determined and assembled in table form.^ Such a table 
affords a means of interpreting a difference obtained from two 
independent random samples with reference to the difference 
for the corresponding universes, provided the difference is not 
influenced by systematic errors of measurement or other data 
faults. When the critical ratio is 4 00 or greater, the difference 
IS commonly called statistically significant This is a technical 
term and should not be interpreted as meaning that the differ- 
ence is necessarily dependable or is significant in a practical 
sense.^ For further discussion of the interpretation of differences, 
see pages 237, 244 f., and 305 f 
The probable error of statistics as a meastire of reliabiHly. 
In a number of texts the probable error of statistics is dis- 
cussed under the head of “reliability.’’ ® The use of this cap- 
tion tends to be misleading because it represents a haff truth 
When the data are accurate, a complete statement would be, 
“reliability of statistics derived from a large random sample of 
accurate data when used as estimated values of the correspond- 
ing universe.” When the data are inaccurate, and the type of 
inaccuracy is that of variable errors of measurement, the 
probable error formulae account for the effect of chance in the 
selection of the sample and of the variable errors of measure- 
ment The formulae do not give a measure of the effect of sys- 
tematic errors of sampling (bias in the selection of the sample), 
systematic errors of measurement, variable errors of validity, 
and systematic errors of validity. This point is important. 
The probable error was developed by astronomers and workers 
in other sciences who were dealing with groups of data that 
might be assumed to be random samples of universes. Fur- 
thermore, these data did not involve errors of validity, and 
systematic errors of measurement were disregarded. When the 


1 Garrett, H E Statistics in Psychology and Education New York. Long- 
mans, Green and Company, 1926, p. 135 

2 For a discussion of this point see Lincoln, E A “The Insignificance of 
Significant Differences,” Journal of Experimental Education, 2 288-90, March, 
1934 

3 For discussion of reliability see pages 177 and 199 i 
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probable error technique was taken over by workers in the field 
of educational research, they failed to keep in mind the condi- 
tions under which the technique was developed. 

Another point to be noted is in regard to what the probable 
error is Not infrequently one finds in educational writings a 
statement to the effect that the value of a certain statistic 
calculated from a sample is a certain magnitude plus or minus 
its probable error Such statements are absurd The mean of 
a group of data %s the calculated mean of that group and is to be 
considered accurate, provided the assumptions underlying the 
procedure of calculating it are satisfied and no errors were made 
in the arithmetical work Under these conditions there is no 
need to consider the reliability of the calculated value as long 
as it is thought of as the mean of the data from which it was 
computed It is only when the calculated mean is used as an 
estimate of the mean of a larger population or universe that the 
probable eiror due to sampling is useful Hence, it is not cor- 
rect to write such an expression as ilf = 26.4 ±15 unless 
the intention is to designate the probable limits of the mean of 
the universe If the calculated mean is designated, the prob- 
able error value should not appear. Confusion would be avoided 
by employing M to designate the mean of the universe as sug- 
gested on page 106 

Conditions tinder which probable error formulae are to be 
used. The conditions under which the probable error for- 
mulae are to be used have been implied in the preceding para- 
graphs. They are to be applied only when the calculated value 
of a statistic is taken as the estimated value of the statistic for 
a larger population or universe.^ A further requirement is 

1 As pointed out on page 103, the derivation of the formulae assumes an infinite 
population or universe When the random sampling has been from a finite 
population, Peters and Van Voorhis have shown that the formulae for the mean 
and the standard deviation should involve ^|l-p as a factor, p being used to 
designate the per cent of the population included in the sample When the 
population is infinite, p, of course, becomes zero The use of the usual formulae 
with random samples from finite populations merely makes the probable limits 
conservative, which, in view of the probable presence of errors m the data, is not 
undesirable. Peters, C. C , and Van Voorhis, WE “A New Proof and Cor- 
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that the group of data from which calculations have been 
made qualify as a large random sample of this larger popula- 
tion Large is a relative term but the formulae are generally 
considered to be applicable when N is not less than 30.^ The 
requirement of randomness of the sample is not so simply 
interpreted. If the data have been selected from the universe 
by a process of random sampling, the requirement is satisfied, 
but, as pointed out on page 58, random sampling is seldom 
feasible in educational research. Hence, we are interested in 
the question, can a group of data selected in some other way 
qualify as a random sample? 

Logically, the answer seems to be definitely in the negative. 
If this position is accepted, it follows that probable error formu- 
lae have little application m educational research It may be 
argued, however, that the calculation of the probable error of 
a statistic, even when not justified on a logical basis, operates 
to make one conservative in generalizing from the calculated 
values of statistics, especially small differences. The answer to 
this argument is that it represents a half truth. The calculation 
of a probable error appears to have caused many persons to 
believe that they have secured a measure of the probable degree 
of the inaccuracy of the calculated statistic This inference 
is absurd The accuracy of a statistic is affected by other data 
faults ^ As a practical procedure the probable error may be 
calculated even though the sample is not known to be random, 
provided it is not obviously biased, but caution should be exer- 
cised in its interpretation. As a general principle one should 


rected Formulae for the Standard Error of a Mean and of a Standard Devia- 
tion,” Journal of Educational Psychology, 24 620-33, November, 1933 

1 When N is less than 30 a modified technique may be used. For example, see 
Fisher’s method of determining the significance of coefficients of correlation 
obtained from small samples 

Fisher, R A Statistical Methods for Research Workers* Edinburgh' Ohver 
and Boyd, 1925, pp 159-62 

For a simplification of Fisher’s method see Ezekiel, Mordecai Methods of 
Correlation Analysis New York John Wiley and Sons, Inc , 1930, pp 256-57. 
See also page 121 

2 Data faults and their effects upon statistics are considered in the following 
chapter. 
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keep in mind that the probable error formulae are designed to 
give only an estimate of the effect of chance and of variable 
errors of measuiement when the calculated value of a statistic 
IS taken as the estimated value of the statistic for the universe. 
They do not yield any measure of the effect of other data faults 
which may be relatively large If a sample is judged to bo repre- 
sentative or approximately so, the calculated value of the 
statistic may be used as an estimate of its value for the uni- 
verse, but a small probable error is not proof of a high degree 
of representativeness. 

The value of r for a specified population, given the value of r 
for a non-random sample. Not infrequently the data available 
for calculating the coefficient of correlation obviously form a 
selected sample of the population for which the value of r is 
desired A special case is when a coefficient of reliability for a 
test has been calculated from scores obtained from a single 
grade group and the coefficient of reliability is desired for a 
population including a sequence of grade groups. Kelley ^ 
has devised a formula for this case by assuming that the varia- 
bility (cTi oo) of the variable errors of measurement ^ is the same 
for the narrow range of talent as it is for the wide range Using 
small letters to designate the statistics for the narrow range and 
capital letters for those of the wide range, this assumption gives 

cTi 00 == — ru = Si oo = SiVl — Ru 

YL— ^ 

2i Vl — ttj 


Solving for Rij we have 


J2ij= 


Sf - 0-1(1 -- ru) 

Sf 


This formula also involves the assumption that variable errors 
on the two testings to determine reliability are uncorrelated 

^ Kelley, T L, “The Heliability of Test Scores,” Journal of Educational Re- 
search, 3 370-79, May, 1921 

* For an explanation of the expression <ri oo = <7‘i V 1 see pages 132 f 
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Holzinger has called attention to the doubtful validity of this 
assumption.^ Hence, the formula should be used with caution. 

For the correlation between true measures of intelligence and 
true measures of achievement in a population made up of 
several grade groups, Kelley ^ has developed a formula that 
may be written in the same form. 


Rat = 


El ^ crld ^ 
21 


The subscript A designates achievement and 7, intelligence. The 
derivation involves the assumptions that <7^, <ti, and 
are the same for the several grade groups entering into the wide 
range population and that the difference between the means of 
the successive grade groups is constant. 

Formulae have been developed for certain other cases of 
selection,^ but the assumptions on which they are based restrict 
their application. The formulae may be used m making esti- 
mates of the correlation for the desired population when the 
assumptions are not fully satisfied, but the results should not 
be considered precise.^ 

The value of r for the corresponding homogeneous population. 

A population is heterogeneous with reference to a given trait 
when it exhibits individual differences relative to it. For exam- 
ple, a typical fifth-grade group is heterogeneous with reference 


1 Holzinger, K J Statistical Methods in Education Boston Gmn and Com- 
pany, 1928, pp 251-54 

See also Brown, William, and Thomson, G H The Essentials of Mental Afeas- 
urement Cambridge University Press, 1921, pp 158 f 

2 Kelley, T L Interpretation of Educational Measurements Yonkers-on- 
Hndson World Book Company, 1927, p 202. 

2 Pearson, Karl On the Influences of Double Selection on the Variation and 
Correlation of Two Characters,^’ Biometrika, 6 111-12, 1908 

Pearson, Karl “On the General Theory of the Influence of Selection on Cor- 
relation and Variation,” Biometrika, 8 437-43, 1912 

Kelley, T L. Statistical Method New York The Macmillan Company, 1923, 
pp 224-25 

Beatley, Bancroft “Achievement m the Junior High School,” Harvard 
Studies in Education, Vol 18. Cambridge Harvard University Press, 1932, p 43, 

* For an illustration of the application of one of these formulae see Monroe, 
Walter S , and Stmt, Dewey B “The Interpretation of the Coefficient of Cor- 
relation,” Journal of Experimental Education, 1 194f , March, 1933. 
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to mental age. If the coejEcient of correlation between two 
measures is known for a population that is heterogeneous with 
reference to a third measuie, the value of r for the correspond- 
ing homogeneous population can, under certain conditions, be 
calculated by employing the technique of partial correlation. 
See Chapter XI, pages 377 f ^ 

F Interpretation of Statistics 

Interpretation of the mean and median. Both the mean and 
the median are easy to understand The mean is simply the 
arithmetical average which is defined as the quotient of the 
sum of the several measures divided by the number of items. 
The calculation from a frequency distribution is simply an 
economical technique. The meaning of the median is expressed 
in its definition ^ Both statistics designate a point on the scale 
of measurement. In the calculation of both the mean and the 
median no assumption is made with reference to the shape of 
the distribution, but we tend to associate those statistics, espe- 
cially the mean, with the normal distribution. Hence, it is 
desirable to keep in mind that the mean is affected by the mag- 
nitude of each measure An illustration of the mean is afforded 
by the position of a fulcrum of a balanced beam on which 
numerous weights have been placed A small weight at one 
end will balance a larger one on the opposite side near the 
fulcrum. 

The relative merits of the mean and the median. Since the 
mean is seldom equal to the median unless the distribution is 
perfectly normal, the question of the relative merits of these 
two central tendencies arises ^ Unless the skewness of the dis- 
tribution is marked, the mean is generally recommended It is 
rigidly defined mathematically and hence lends itself to alge- 
braic treatment. Its computation is usually a first step in the 

^ See page 72. 

2 For a more extended discussion of the comparative advantages and dis- 
advantages of the different measures of central tendency and of variability see 
Odell, C W Educational Statistics New York The Century Company, 1925, 
pp 10^09, 140-42. 
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calculation of the standard deviation, median deviation, and 
coeifficient of correlation The mean is based on all of the 
measures and on their exact magnitude The median is easier 
to compute but it is less rigidly defined mathematically than the 
mean, and does not lend itself to algebraic treatment The me- 
dian should be used in preference to the mean when the dis- 
tribution includes extreme measures that make the mean ma- 
terially different from the median, when the exact magnitude 
of some of the measures is not known, or when ease and rapidity 
of computation is an important consideration. 

The meaning of a measure of variability. A central tendency 
is a point; a measure of variability is a distance If a normal 
distribution is assumed, the distance may be defined in terms 
of the per cent of the distribution (area under the frequency 
curve) that is marked off when the distance is measured in each 
direction from the central tendency and perpendiculars are 
drawn at the points thus located In the case of the standard 
deviation (a*) this per cent is 68 27. For the median deviation ^ 
the per cent is 50 

When comparing standard deviations or median deviations, 
the absolute magnitude of the measures of variability may not 
be as significant as ratios formed by dividing each standard de- 
viation or median deviation by the mean or median of its 
distribution The quotient formed by dividing the standard 
deviation by the mean ^ is commonly referred to as the coeffi- 
cient of variability It should, however, be used with caution, 
because the zero points for much of our educational data are 
arbitrary. For further reference to this topic see Chapter VIII. 

The interpretation of a coefficient of correlation — an introduc- 
tory statement. Both the median and the mean are easy to 
comprehend. The concepts that they represent are simple and 
the process of calculation can be rationalized. Even in the case 
of the standard deviation (tr), the general plan of calculation is 

^ The median deviation of a distribution of measures should be called the 
median deviation (MdD) , and the term probable error (PE) should be used only 
to refer to the median deviation of a distribution of errors 

2 Usually the quotient is multiplied by 100 
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easily understood and the result has a graphical interpretation. 
The calculation of a coefficient of correlation is more compli- 
cated and hence difficult to rationalize. Furthermore, if one 
succeeds in rationalizing the process of calculation, he will find 
that he has made little progress toward the interpretation of 
this statistic. A coefficient of correlation is a pure number. It 
does not represent a point or a distance 

When one consults texts on educational statistics concerning 
the meaning to be associated with a given coefficient, he is told 
that the value of a coefficient of correlation cannot be greater 
than +1 00 or less than —1 00; that a positive coefficient is 
evidence that the larger magnitudes in one set of data tend to 
be paired with the larger in the other and likewise the smaller 
magnitudes in one set of data tend to be paired with the smaller 
in the other, that a negative coefficient is evidence of inverse 
pairing, the larger magnitudes in one set tending to be paired 
with the smaller ones m the other; that the magnitude of the 
coefficient is indicative of the completeness of this pairing, 
being complete when r = 00; and that when the coefficient 

is 0 00 the pairing is on the basis of chance and no relation 
exists between the two sets of measures. Obviously there is 
some type of correspondence between the magnitude of the 
coefficient and the degree of relationship between the two sets 
of paired measures or the variables represented by them, but 
educational statisticians have given only superficial attention 
to the degree of relationship to be associated with particular 
numerical values of r, such as .18, 30, .50, or 75. 

In 1917 Rugg ^ suggested the following general interpreta- 
tions r less than .15 to 20, correlation ^'negligible’' or “in- 
different”; r from 15 or .20 to 35 or .40, correlation “present 
but low”; T from .35 or .40 to .50 or 60, correlation “marked”; 
T above .60 or .70, correlation “high.” Such interpretations 
are limited in meaning and they may be misleading if not 
actually erroneous because “negligible,” “marked,” “high,” 

1 Rugg, H. 0 Statistical Methods Boston Houghton MiflSin Company, 1917, 
p 256 
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and the like are subject to two meanings They may be used in 
an absolute sense, i e., to designate degrees of relationship be- 
tween zero correlation and perfect correlation, but they are 
more commonly used to express comparisons within a similar 
population For example, a ^‘high’^ man designates one that is 
tall in comparison with other men although his height is actually 
much less than that of a “high^^ tree or building. Hence, the 
interpretation of a given coeflScient of correlation as ^^high’^ 
may be understood to mean either that the coefficient of correla- 
tion IS numerically greater than most of the coefficients obtained 
in the field of education or that it is large in comparison with 
those obtained from similar populations of data. If the second 
meaning is intended, a general scheme of interpretation such 
as that proposed by Rugg cannot be used. 

Coefficients calculated from the scores obtained from the 
administration of two forms of a test are, other things being 
equal, higher than those calculated from intelligence test scores 
and measures of silent reading ability; these are higher than 
those summarizing the relationship between high school marks 
and those received in college, and these m turn are higher than 
the coefficients for IQ^s and measures of teaching success 
Hence, a coefficient of 50 would be very high for IQ^s and meas- 
ures of teaching success, slightly helow average for high school 
marks and college marks, low for the relation between in- 
telligence test scores and measures of silent reading ability, 
and very low for the reliability of a test The interpretation of 
^^marked” suggested by Rugg would not be a satisfactory 
description of the relative degree of relationship in any of the 
four cases. This condition suggests interpretation schemes 
based upon the ranges of reported coefficients for different types 
of data ^ But the situation is further complicated by the fact 
that the magnitude of a coefficient of correlation is affected by 
the range of the population from which the data were obtained. 
For example, if the population is selected from a single school 

1 The ranges of reported coefficients for several types of data are given by Odell, 
C W Educational Statistics. New York The Century Company, 1925, p 173 
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grade, the coefficient of correlation between mental age and 
achievement will be less than if it were calculated from a popu- 
lation representing a sequence of two or more grades. In other 
words, for measures of two traits the greater the standard 
deviations of the distributions of these measures, the larger the 
calculated coefficient of correlation.^ Hence, a comparison of 
coefficients of correlation, even when they have been obtained 
from measures of the same traits, is likely to be misleading un- 
less the standard deviations of the distributions of the measures 
are taken into account 

The use of such terms as ^Tow,^^ ^^marked,” and ^'high,^^ is 
not recommended When a writer or speaker does employ them, 
he should make clear whether the description of the degree of 
relationship is being made on an absolute or relative basis 
More satisfactory interpretation procedures will be described 
in connection with the consideration of the uses of correlation 

The uses of correlation. The uses of correlation are indicated 
by the questions with reference to which coefficients may be 
interpreted. 

1, Given the coefficient of correlation for a random sample, is there 
any relationship between the two sets of paired measures m the 
universe? 

2 When r is a coefficient of reliability, what is the magnitude of 
the variable errors of measurement‘^ 

3. When a regression equation derived from the correlation table 
is used as a formula of prediction, how accurate are the predic- 
tions-^ 

4. Given the coefficient of correlation between two sets of paired 
measures, what is the degree of relationship between these sets 
of measures, or the traits underlying them, within the population 
represented by them‘d 

Only the first of these questions will be considered here. The 
second is treated in Chapter V, the third in Chapter X, and 
the fourth in Chapter XL 

The statistical significance of a coefficient of correlation. The 

statistical significance of a coefficient of correlation relates to 

1 It IS, of course, necessary that other conditions remain the same 
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the first of the questions listed above. A value of r not equal 
to 0-00 is evidence of the existence of a relationship between the 
two sets of paired measures withm the population to which the 
coefficient applies. Of course, a very small value of r such as .05 
or even 10 indicates only a very slight degree of relationship, 
and, if the number of cases is less than 30 or the assumptions 
underlying the calculation of the coefficient are not known to 
be fully satisfied, this interpretation should be made with 
caution 

A positive coefficient denotes a positive or direct relationship 
between the two sets of paired measures A negative coeffi- 
cient denotes an inverse relationship. Hence, if the value of r 
for the universe should not have the same sign as the calculated 
value, the meaning would be reversed. A calculated coefficient 
of correlation is said to be staUshcally significant when the 
probability is very slight that the coefficient for the universe 
would be zero or have the opposite sign. When the data, from 
which r has been calculated, have been obtained by a process 
of random sampling or may be assumed to qualify as a random 
sample, this probability may be obtained by comparing the 
calculated coefficient with its probable error ^ It is a common 
practice to designate a coefficient as statistically significant 
when it is greater than four or five times its probable error. A 
statistically significant coefficient of correlation may be inter- 
preted as meaning that in the umverse there is at least a slight 
degree of relationship between the two traits or phenomena. It 
should be noted, however, that the procedure for determining 
statistical significance requires that the measures from which r 
has been calculated be a random sample of the umverse. Hence, 
determinations of statistical significance should be made with 
caution Unless the sample is large and the assumption of its 
random character appears justified, the label of ^^statistically 
significant^^ should not be applied when the coefficient is only 
four or five times its probable error Many writers have judged 
a coefficient to be statistically significant when a critical exam- 

1 See page 104 for the formula 
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ination of the evidence would reveal that the sample probably 
was not random A further reason for caution in pronouncing a 
coeflScient of correlation statistically significant is found in 
the fact that the calculated value may be materially affected 
by the choice of the intervals in the correlation table. ^ If the 
data from which the calculation is made involves variable 
errors, the value obtained will be attenuated ^ Hence, the 
calculated value may not be the correct value for the label 
expressed or implied and when this is the case the usual method 
of determining statistical significance is not sound 
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CHAPTER V 


THE FAULTS OF DATA AND THEIR EFFECTS 

Purpose of this chapter. The calculated value of a statistic 
may not be the correct value even when no arithmetical errors 
have been made in the process For example, the mean cal- 
culated from a frequency distribution may not be the correct 
value of the mean of the data. On page 103 it was pointed out 
that frequently the value of the mean or other statistic calcu- 
lated from a sample of data is labeled or used as the value of 
the statistic for a larger population or universe. TlTien this is 
done, the calculated value'’ is likely not to be precisely the 

correct value" for this label Other types of labels are ex- 
plicitly or implicitly assigned to calculated or obtained values 
of statistics. A term is needed to refer to the degree of agree- 
ment between the correct value" of the statistic as labeled and 
the obtained value " The word accuracy should not be used 
because it is widely employed with a different meaning A term 
with a broader meaning is needed “ Dependabihty " is suggested. 

The dependability of the calculated value of a statistic will 
vary with the label explicitly attached to it or implied m its 
interpretation The types of labels may be illustrated by con- 
sidering those that may be attached to the mean calculated 
from a frequency distribution of achievement test scores These 
labels are 

1. Mean of the obtained scores. 

2. Mean of the true scores of the group tested under standard testing 
conditions 

3 Mean of the true scores of a larger population or universe 

4. Mean of true measures of achievement as specified of the popula- 
tion tested. 

5. Mean of true measures of the achievement as specified of a larger 
population or umverse 
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The value of the mean calculated from* the frequency distribu- 
tion may he 100 per cent dependable when assigned the first 
label, but this is not necessarily the case. In general, the de- 
pendability of a calculated value will decrease from label to 
label in the order given. 

It is the purpose of this chapter to consider the causes that 
contribute to undependability and the effects of these causes, 
commonly referred to as “data faults/^ upon the various statis- 
tics. In certain cases a more correct value of the statistic as 
labeled may be calculated. In other cases the probable limits 
of the correct value may be determined. In general, however, 
the degree of dependability can only be estimated. Hence, it 
is important that an investigator understand the possible faults 
of data and their effect upon the various statistics Estimating 
the dependability of the findings of educational research is the 
first step in their interpretation 

In the following pages arithmetical accuracy of the calcula- 
tions will be assumed. It will also be assumed that the number 
of decimal places or significant figures is consistent with the 
approximate nature of the data ^ Attention will not be given 
to faults of procedure or to the erroneous interpretation of 
statistics. The restiiction of the scope of this chapter to the 
faults of data and their effects is not intended to imply that 
other matters are not important As a matter of fact, many pub- 
lished studies are to be criticized for other reasons. Some are 
not wisely planned The investigator may have failed to gather 
all of the data his problem called for He may have failed to 
use appropriate statistical techniques. He may have allowed 
his prejudice to influence his interpretation of his findings. 

A. Types of Data Faults 

The faults of quantitative data. When statistics are inter- 
preted without generalization, two general types of data faults 
require consideration — (1) errors, and (2) failure of the data 
as a group to conform to the conditions (assumptions) underly- 

^ See pages 62 f. 
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ing the formulae used or associated with the interpretation of 
a mean, standard deviation, coefficient of correlation, or other 
statistic. When the interpretation of a statistic is extended to 
a larger population or universe, the determination of the de- 
pendability of the calculated value requires also consideration 
of the degree to which the data collected are representative of 
the population for which the generalization is desired. 

In the following pages these faults are first described briefly. 
Then each is considered in detail, giving attention to their 
causes and their effects upon statistics. 

Errors. The difference between a measure and the criterion 
by which it is judged is called an error. Suppose the height of 
a child is measured as 57.5 inches; the error is .5 if 57.0 inches 
IS the criterion, ~1 5 if 59 0 inches is the criterion, and so on. 
The criterion by which an obtained measure is to be judged is 
implied in the label given it or a derived statistic when that 
label is precisely expressed. In the field of physical measure- 
ments, little attention is given to this principle because the 
criterion is usually the ^^true measure ” But, in the case of test 
scores and certain other types of educational data, we may 
employ any one of several labels, each implying a different 
criterion. The precise meaning of the label “obtained score’’ 
varies with the conditions under which the test was admin- 
istered. First-trial scores and second-trial scores indicate two 
variations. “True score,” which has a meaning corresponding 
to “true measures” in the physical realm, usually implies 
“standard testing conditions ” In addition, we have labels 
specifying measures of a designated ability or trait. ^ Consider- 
ing the variations possible, the necessity of precise labels is 
obvious. Unfortunately, complete and precise labels are seldom 
attached to educational data. The term “score” is frequently 
used in the sense of “true score.” Sometimes “score” is ap- 
parently used with the meaning of measure of a particular 
ability or trait As a basis for considering errors in data, it 

^ The reader may find it helpful to refer to the discussion of the meaning of the 
measurement of traits and abilities on pages 172 f 



124 STUDY OF EDUCATIONAL PROBLEMS 

is necessary to give careful attention to the labels attached 
to them. 

Any obtained measure may bo thought of as equivalent to 
the algebraic sum of the corresponding true measure and an 
error. If we use X to designate an obtained measure and 
the corresponding true measure, their relation may be expressed 
as follows: 

The error designated by e may be either positive or negative 
For example, if a pupiFs true score is 42 and the obtained score 
is 37, this measure is to be thought of as involving a negative 
error of ~5, i e., 37 == 42 — 6. 

In educational research a worker is usually concerned with 
groups of data rather than with individual items When at- 
tempting to determine the effect of the data faults upon a 
statistic calculated from a group of data, it is necessary to 
think of the error in each item as consisting of two parts — one 
systematic and the other variable In terms of symbols this 
relationship may be expressed as follows 

X === A 00 “f" -f" Cvar 

As the term implies, a systematic error is one that conforms 
to some system. An infinite variety of systems of errors is 
theoretically possible, but m educational research our concern 
is with a systematic error that affects all measures of a given 
group in the same direction. For example, if four and one-half 
minutes are allowed in administering a speed test whose specified 
time hmit is four minutes, and none of the pupils finish, the 
size of the systematic errors due to the allowance of too much 
time will tend to be proportional to the scores on the test, the 
greater the score, the greater the error The simplest type of 
systematic error is one that is constant, that is, the same for 
all items of data in the group. When this is the case, the term 
constant error is used It is likely that a systematic error in 
educational data is rarely perfectly constant, but unless there 
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is obviously some relationship between the error and the 
magnitude of the measures, it is commonly considered as 
constant 

Variable errors are chance errors. Within a group of measures 
they vary with respect to both magnitude and direction. In a 
large, imselected group of measures there are approximately 
as many negative variable errors as there are positive ones and 
if the variable errors in such a group could be separated out and 
assembled in a frequency distribution, its shape would approx- 
imate that of the normal probability curve and its mean would 
be zero Variable errors occur in precise physical measurements. 
If the diameter of one hundred steel balls is measured to one- 
hundredth of a millimeter by means of a micrometer calliper, 
a second set of measurements is not likely to agree with the first. 
By repeating the measurement several times and using the 
mean of the results for each ball as its true measure, the variable 
errors m the first set of measurements may be calculated Re- 
peated measurement of human abihties and traits is not feasible 
because the act of measuring will usually introduce a practice 
effect which designates a positive systematic error. Conse- 
quently, variable errors in educational measurements cannot 
be isolated, but it is helpful to conceive of them as if they 
could be. 

An idea of the magnitude and distnbution of the variable 
errors that occur in educational measurements may be obtained 
by calculating the differences between the scores on two equiva- 
lent forms of a test when the interval between their administra- 
tion has been so short that there has been little change in the 
ability of the pupils Table VI gives the distribution of the 
differences between the scores made by a group of fifth-grade 
pupils on two forms of Monroe's General Survey Scale in 
Arithmetic For example, one pupil made a score of 61 on the 
first trial and 59 on the second. The difference between the 
two scores is —8. The differences given in Table VI are not 
the variable errors of either set of scores Both sets of scores 
involve variable errors and one or both of them involves a 
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systematic error ^ In order to obtain the exact magnitude of 
the variable error of measurement for a given pupil, it would be 
necessary to secure his true score and to subtract it from the 
obtained score This difference would be the variable error of 
measurement involved in his score. 

Non-conformity with assumed group characteristics* Atten- 
tion has been called in the preceding chapter to the fact that 
in making computations from frequency distributions the as- 
sumption is made that the measures are uniformly distributed 
within the several intervals or that the mid-point of an interval 
may be used as the average value of the measures within it. 
Linearity of relationship is assumed in the derivation of the 
formula for the product-moment coefficient of correlation. Other 
assumptions are made in connection with other formulae. Ob- 
viously, non-conformity with such assumptions is likely to 
introduce an error in the calculated value of a statistic Slight 
departures from conformity are usually not very significant, 
but precision in research requires that the group characteristics 
of the data be considered and if the degree of non-conformity 
appears sigmficant, the use and interpretation of the calculated 
values of the statistics should be hmited accordingly. 

In our use and interpretation of statistics we commonly as- 
sume certain conditions with respect to the data they have 
been derived from. For example, we commonly assume that a 
mean has been calculated from a distribution of measures that 
is approximately normal. Failure to recognize that the data 
may not satisfy the assumed conditions is likely to lead to 
erroneous interpretations 

Non-representativeness of data. In Chapter III it was 
pointed out that frequently it is not possible or at least not 
convement to collect all of the data called for by the problem. 
This is almost certain to be the case whenever the conclusion 
desired is a generalization. When only a sample of the data 

1 The presence of a systematic error in one or both sets of scores is shown by 
the mean of the differences If there were no systematic error, this mean would 
be approximately zero 
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has been collected, any non-representativeness of this sample is 
a fault that must be considered. The value of a statistic cal- 
culated from a non-representative sample will usually differ from 
that for the total population or universe For example, data 
from a bright fifth-grade class will not yield values that are 
the same as those for a population of typical fifth-grade pupils. 
If the blanks returned in a questionnaire study are from a se- 
lected group, as is not infrequently the case, the findings may 
not be true for the total population to which the questionnaire 
was mailed. 

Errors of measurement and validity.^ On page 123 the point 
was made that the magnitude of the error m a datum depends 
upon the criterion by which it is judged The labels, which imply 
the criteria by which test scores are judged, may be classified 
under two heads (1) those that specify scores, obtained or true, 
resulting from the administration of the test under certain con- 
ditions, and (2) those that specify measures of a certain ability 
or trait. When the label is of the first type, only errors of 
measurement arc involved When measures of an ability or 
trait are specified, the possibility of errors of validity must 
also be considered These classes of errors may be considered 
subordinate to both systematic and variable errors, thereby 
creating four types of errors 

Errors of measurement are those due to the variability of 
human responses,^ the structure of the test, variations m its 
administration, and subjectivity m scoring pupil performances. 
If the label, explicit or implied, specifies ^Hrue scores for the 
conditions attending the administration of the test,’^ only vari- 
able errors of measurement are involved. If the label specifies 
“true scores for conditions other than those that prevailed, 

1 This discussion is in terms of test scores. With modification, it is applicable 
to certain other types of educational data 

^Holzmger has used the term “response errors” to designate much the same 
concept as is here represented by “errors of measurement ” Since the “variabil- 
ity of human responses” is only one of several causes that are operative m the 
measurement process, the phrase, “errors of measurement” seems to be prefer- 
able See Hobmger, K J Statistical Methods for Students in Education Boston: 
Gmn and Company, 1928, pp 250-55 
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a systematic error may be introduced This will be in addition 
to the variable errors and an obtained score should be thought of 
as the algebraic sum of the corresponding true score, a variable 
error, and a systematic error 

Errors of validity are due to indirect measurement, ie., the 
use of measures of one thing as measures of something else 
As an illustration, consider the hypothetical situation in which 
the problem calls for measures of weight, but the investigator 
has no instrument for measuring the weight of children. If he 
substitutes measures of height for the needed measures of 
weight, it is obvious that variable errors are introduced ^ These 
errors are not due to the process of measurement. They result 
from the substitution and are called variable errors of validity. 

Although this illustration is an imaginary one, the condi- 
tions that it portrays are similar to those that prevail whenever 
the process of measurement is indnect ^ For example, general 
intelligence, which is commonly thought of as ability to learn, 
is measured indirectly by measuring certain t3npes of achieve- 
ment or what pupils have learned Ability to spell words when 
writing sentences and paragraphs, as they are being composed, 
is measured by measuring the ability to spell dictated words. 
A silent reading test consisting of a series of short, uncon- 
nected paragraphs, each followed by a question to be answered 
before the next exercise is read, appears to measure an ability 
that differs significantly from the ability to read typical ma- 
terial In many cases the substitution is less obvious, but when 
the label specifies measures of some abihty or trait rather than 
merely test scores, the possibility of errors of validity is created. 

When a substitution is made, the measures may be thought 
of as being in terms of a ratio or an index. Hence, the satis- 
factoriness of indirect or substitute measures depends upon the 

1 A coefficient of correlation between height and weight of 69 is reported by 
Gates 

Gates, A I “The Nature and Educational Significance of Physical Status 
and of Mental, Physiological, Social, and Emotional Maturity,” Journal oj 
Educatioml Psychology, 15 329-58, September, 1924 

2 For precise meaning of indirect measurement, see pages 144-46 and 172 
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constancy of the ratio of the magnitude of the ability or trait 
actually measured to that of the ability or trait whose measure- 
ment is desired. Unless there is perfect correlation between 
true measures of the ability or trait actually measured and true 
measures of the specified ability or trait, variable errors of 
validity are introduced In the illustration of substituting 
measures of height for measures of weight the satisfactoriness 
of using the former for the latter depends upon the constancy 
of the ratio of height to weight This ratio is not the same for 
all children, and the variation indicates the introduction of 
variable errors of validity as the result of the substitution 
The presence of a systematic error of validity is a matter 
of concern when groups of measures are being compared or one 
is interested in the absolute magnitude of the measures If the 
ratio of the mean of the substituted measures to the mean of the 
desired measures is not the same for two populations, one group 
of measures involves a systematic error of validity. If an un- 
selected age group of boys is being compared with a correspond- 
ing group of girls using measures of height for measures of 
weight, the ratios will differ slightly for physiological reasons 
Hence, a systematic error of validity would be introduced by 
the substitution If two groups of girls are being compared 
with reference to weight using measures of height, and the 
members of one group were wearing high-heeled shoes and those 
of the other low-heeled shoes, a systematic error of validity 
would be introduced for this reason 

B. Erbobs op Measurement 

Causes of variable errors of measurement in test scores.^ 

The responses of a pupil on successive trials of a test tend to 
be variable A striking illustration of this variability has been 

^ The following references relate to variable errors of measurement 
Holzmger, K J “An Analysis of the Errors m Mental Measurement,” 
Journal of Educational Psychology, 14 278-88, May, 1923 

Symonds, P M “Factors Influencing Teat Reliability,” Journal of Educu’^ 
tional Psychology, 19 73-87, February, 1928 

Worcester, D A “Prevailing Errors in New-Type Examinations,” Journal 
cf Educational Research, 18 48-52, June, 1928. 
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furnished by Ashbaugh ^ who arranged a fifteen-minute dicta- 
tion exercise so that each test word occurred three times. The 
three spelling performances of about a fourth of the pupils 
were not consistent. Some pupils spelled a word correctly the 
first time and misspelled it m the later writings. Others mis- 
spelled a word the first time, spelled it correctly the second 
time, and misspelled it again on the third trial Variability of 
performance is also shown by the fluctuations in individual 
learning curves The cause of variations in the response of 
individual pupils in successive testings is complex A pupiFs 
performance at a given time is conditioned by his mental state, 
the effort he makes, his acquaintance with the test and with 
testing procedure in general, the environment under which he 
takes the test, the instructions given to him by the examiner, 
and possibly by other factors. It is possible that the ability 
of a pupil m a given field is subject to short time fluctuations 
It is sufficient for our present purpose to note that variability 
of performance is characteristic of children and of adults. 
Some are known as erratic performers, others are classified as 
consistent performers 

Subjectivity in the scoring of the test papers contributes 
to the variable errors of measurement Evidence of this is fur- 
nished by the numerous studies of the rating of the same exam- 
ination papers by two or more persons Although investigations 
of the Starch-Elliott type exaggerate the variable errors due to 
the subjectivity of the rating of examination papers, they are 
relatively large. Even when the scoring is objective, accidental 
errors may occur In a recent study the Otis Self-Administering 
Test of Mental Ability was admimstered to 550 fifth-grade 
pupils. When the scoring was checked, it was discovered that 
the teachers had scored 362 papers correctly, 45 were scored one 
point too high, 142 were scored three points too low, and one 

^ Aslabaugli, E J “Variability m Spelling,” School and Society, 9 93-9S, 
January 18, 1919 

Another good illustrative reference is Clark, J R , and Vincent, EL. “A 
Study of Variability in Arithmetic,” Journal of Educalional Psychology, 16 267— 
74, April, 1925 
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was scored four points too low If these errors had been un- 
corrected, the mean point score of 29 0 would have been reduced 
0 31 A trained scorer is not likely to make many errors, but if 
precision is desired, the work should be checked ^ 

Describing the variable errors of measurement in a group of 
scores.^ Assuming that the variable errors m a group of scores 
form a normal distribution whose mean is zero, their magnitude 
IS described by a measure of the variability of this distribution 
Although the variable errors cannot be isolated so that this 
distribution can be formulated, its standard deviation may be 
derived If Xi designates the scores yielded by a test and X^o 
the corresponding true scores, the problem is to describe the 
differences ^ Xi — Zoo which are the variable errors of meas- 
urement If X is used to designate raw scores and x the cor- 
responding deviation scores, we may write 

Xi = Ml + Xi Xoo = Mao + Xoo 

Hence Xi — Xoo = a;i — + (Mi — ikToo) 

If no systematic error is involved and N is large, Mi approaches 
Mao and may be taken as equal to it. Hence, xi — Xoo may be 
substituted for Xi — Xoo to represent the variable errors of 
measurement. The square of the standard deviation of the 
distribution of variable cirors of measurement, the square of 
the standard error of measurement, is obtained by substituting 
in and squaring the usual standard deviation formula 

2 ^(^1 ^ 3:^00)^ 

jy' 

_ 2xi 2'ZxiXx 'Sxk 

~ If ^ir 

= orf — 2riooO'io-=o + cri, 

^ The following studies support this contention 

Dearborn, W F , and Smith, W C “ The Results of Rescor mg Five Hundred 
Thirty Dearborn Tests,” Journal of Educational Psychology, 20 177-83, March, 
1929 

Pmtner, Rudolph “Accuracy in Scoring Group Intelligence Tests,” Journal 
of Educational Psychology, 17 470-75, October, 1026 

2 This topic 18 given further consideration in Chapter VII 
® It IS assumed that no systematic error is involved 
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If we have a second set of measures of the ability for the same 
population 


rico = Vru and <j^ =? o-Wrir (See page 151.) 
Substituting these values ^ 

^«var = 2Vrucric7iVrii + o-fru 
= erf - 2afrii + (Tfru 
= af- afru 
= o-i(l - ru) 

^«var = <^iVl - ru 

Since the median deviation (probable error) is easier to in- 
terpret we generally use the formula ^ 

■^■^®var “ 00 ^ 67450*1 Vl — 7iJ 

It should be noted that ru and 0*1 are derived from the same 
population If 0*1 is not approximately equal to erj, the following 
formula should be used 

PEi^ = 6745 

This derivation of the formula for the probable error of 
measurement is based on two assumptions, first, that the 
number of cases (N) is large enough so that Mi may be taken 
as equal to Mao ; second, that the variable errors of measure- 
ment are uncorrelated with the true measures with which they 
are combined The use of the formula introduces the assump- 
tion that the variable errors in a large unselected group of test 

1 It should be noted that in making these substitutions the assumption is 
introduced that the variable errors of measurement are uncorrelated with the 
true scores with which they are combined (See page 205 ) This assumption 
does not appear to be fully satisfied Hence, the formula for should be re- 
garded as only an approximation 

2 Several symbols have been used to designate “probable error of measure- 
ment ” P-ElMeas has the merit of being essentially an abbreviation, but the 
symbol PEi «, seems to be preferable Similarly, ^ is recommended instead 
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scores and the variable errors in the scores resulting from a large 
number of administrations of a test to a single pupil, form similar 
normal distributions Unless the structure of the test is defec- 
tive, this assumption is probably adequately satisfied 

The probable error of measurement gives merely the limits 
between which we may expect to find 50 per cent of the variable 
errors of measurement of a typical group of scores. For example, 
if the probable error of measurement for a given test has been 
found to be 4 0, 50 per cent of the variable errors of measure- 
ment are greater than =±=4 0, approximately one-half of them 
being positive. It also means that 50 per cent of them are not 
larger than =±=4 0. In the case of a given pupil, we can state 
only the chances that the variable error of measurement of his 
score does not exceed certain hmits In the above illustration 
the chances are just even that the variable errors of measure- 
ment in his score is not larger than =±=4 0 The chances are 4 6 
to 1 that it is between —8 0 and +8 0 The chances for other 
limits also may be stated ^ 

A probable error of measurement of 5 0 indicates relatively 
large variable errors when the mean score is 25 0, but relatively 
small ones when the mean score is 150 0. The magnitude of the 
mean score depends upon the size of the unit of the scale of 
measurement and the location of the zero point. Since the loca- 
tion of the zero point is usually arbitrary, the ratio of the prob- 
able error of measurement to the mean has a limited significance, 
and ratio comparisons of tests, with reference to the magnitude 
of variable errors of measurement in the scores yielded by them, 
should be made with caution. Comparisons in terms of age or 
grade units are likely to be more dependable. 

Magnitude of variable errors of measurement to be expected 
in test scores. The variable errors of measurement in test 
scores are much greater than the corresponding errors in physical 
measurements. In a critical study of silent reading tests ^ it was 

1 See page 106 

^ Monroe, W S “A Critical Study of Certain Silent Reading Tests,” Um- 
versity of Illinois Bulletin, Vol 19, No 22, Bureau of Educational Research 
Bulletin, No 8 Urbana University of Illinois, 1922 52 pp 
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shown that the probable error of measurement for some tests 
was greater than 25 per cent of the mean score. In fact, for 
Brown’s Silent Reading Test, it was found to be more than 50 
per cent. In the tests which make up the Illmois Examination 
only twelve of forty-two probable errors of measurement which 
were calculated were greater than 10 per cent of the mean score ^ 
The authors of the New Stanford Achievement Test ^ announce 
that the probable erior of measurement for this battery of tests 
is approximately two months of educational achievement. It 
is likely that these authors have succeeded in reducing the vari- 
able errors of measurement to a lower minimum than has been 
attained by most other test makers This has been accomplished 
in part through extending the length of the test ® 

In another place the senior author has discussed the relative 
magnitude of the errors in the scores yielded by standardized 
tests and the errors in the marks assigned to examination 
papers ^ The evidence presented indicates that the variable 
errors of measurement for a number of widely used standardized 
educational tests are only slightly less than the variable errors 
of measurement for examinations of the essay type 

Causes of systematic errors of measurement in test scores. 
The causes of systematic errors of measurement in test scores 
consist of those influences that affect all scores of a group in the 
same direction They may be classified under the following 
heads: (1) the administration of the test including the time 
allowed, the directions given to the pupils, and the attitude of 
the examiner, (2) acquaintance of pupils with the general pro- 
cedure of testing; (3) acquaintance of pupils with the type of 
exercises in the test; (4) attitude of pupils toward the test 

1 Monroe, W S “The Illinois Examination,” University of Illinois BvUetin^ 
Vol 19, No. 9, Bureau of Educational Research Bulletin^ No. 6. Urbana: 
University of Ilhnois, 1921, p 49 

2 Kelley, T L , Ruch, G M., and Terman, L M New Stanford Achieoement 
Test Manital of Directions Yonkers-on-Hudson World Book Company, 1929. 

16 pp. 

® See pages 20S-09. 

^ Monroe, W S , and Souders, L. B “The Present Status of Written Exam- 
inations,” University of Illinois Bulletin^ Vol 21, No 13, Bureau of Educational 
Research Bulletin, No 17 Urbana University of Illinois, 1923, pp 30-42 
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which IS likely to be influenced by the manner in which the test 
IS administered; and (5) bias of the person rating the perform- 
ances when this process is subjective. 

No specific procedures can be prescribed for identifying the 
causes that are operative in a particular case, but a few illus- 
trations will indicate the magnitude of the systematic errors 
to be expected in test scores In the following illustrations the 
systematic errors of measurement are referred to as constant 
although they may be more complex 

Illustrations of evidence of the presence of constant errors of 
measurement in test scores. If two forms of a test are admin- 
istered to a group of pupils, the second set of scores will be sub- 
ject to a ^^practice effect ’’ The difference^ between the mean 
of the first-trial scores and the mean of the second-trial scores 
is an index of the relative magnitude of the constant error in 
the two sets of scores This difference, however, should not be 
interpreted as being the true magnitude of the total constant 
erroi of the second-trial scores It is possible that the first-tiial 
scores also involved a constant error due to failure to secure 
standard testing conditions or to other causes. However, when 
the means of the two sets of scores arc not equal, we have evi- 
dence of the piesence of a constant error in at least one set. 

The relative magnitude of the constant error in second-trial 
scores varies, but the following cases are probably typical. 
The Illinois General Intelligence Scale ^ was given twice to 
several hundred pupils in Grades III to VIII inclusive After 
making due allowance for the inequality of the two forms of 
this scale ^ the difference between the means of the two sets of 
scores was approximately five points, or six months of mental 
age. In the eighth grade, in which somewhat unusual testing 
conditions appear to have prevailed, the difference was con- 


^ It will, of course, be necessary to make an appropriate allowance for any lack 
of equivalence of the two forms in comparing the means 

2 Monroe, W S. “The Illinois Examination,” Umversity of Illinois Bulletin, 
Vol 19, No 9, Bureau of Educational Research Bulletin, No. 6 XJrbana. 
University of Illinois, 1921, p 69. 

® lUd,, p 10. 
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siderably greater. For Monroe’s General Survey Scale in 
Anthmetic administered to the same groups, the difference 
between the mean of the first-tnal scores and that of the second- 
trial scores was approximately 3 2 points in Grades III to V, 
and 4 5 points in Grades VI to VIII Two forms of the Thorn- 
dike-McCall Reading Scale were given to several groups of 
pupils. The mean of the first-trial scores (Form 2) was 47 78, 
the mean of the second-trial scores (Form 3), 51 69. 

When any considerable peiiod of time elapses between two 
trials on a given intelligence test, the instruction which pupils 
receive during this intenm may materially influence their 
second trial scores. In an investigation^ by the Bureau of 
Educational Research it was found that for a group of 134 
children the mean increase of the second-trial scores on the 
Illinois General Intelligence Scale over the first-trial scores 
was equivalent to slightly more than four years in mental age 
The two trials were six months apart and hence the norma! 
increase in mental age to be expected would be six months 
If we assume that the first-trial scores did not involve a con- 
stant error, it follows that the constant error introduced in 
the second-trial scores was somewhat greater than three and 
one-half years of mental age Investigation revealed that the 
language instruction of these pupils during the penod between 
the two testings functioned as coaching for the test 

Table VII gives certain gains in achievement which were 
obtained m an experiment to determine the relative effect of 
the number of sections into which a class was divided.^ The 
six experimental groups were taught under conditions con- 
sidered to be the same with the exception of the difference in 
sectioning The tests used were Monroe’s Standardized Silent 
Reading Test I, Revised, and Monroe’s General Survey Scale 
in Arithmetic. Form 1 of these tests was given early in Octo- 
ber, Form 2, the first of February, and Form 1 was again 

1 Monroe, op cit , pp 69-70 

2 Monroe, W S “The Constant and Variable Errors of Educational Measure- 
ments,*’ Umversity of llhnois Bulletin, Vol 21, No 10, Bureau of Educational 
Research Bulletin, No 15. Urbana Umversity of Illinois, 1923, p 15 
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administered the following May. The first gains were calcu- 
lated by subtracting the mean of the October scores from that 
of the February scores; the second, by subtracting the mean 
of the February scores from that of the May scores. The two 
forms of these tests have been shown to be slightly lacking m 
equivalence, especially in the case of reading rate ^ The gams 
in Table VII, however, are evidence of the presence of system- 
atic errors in addition to those resulting from the slight non- 
equivalence of the dijfferent forms 

Table VII Two Sets of Gains in Achievement Which Indicate the 
Presence of Constant Errors in Certain Sets op Scores, Fifth 
Grade 


Gro-dp 

No OF 
PtJPILS 

Ebading Rate 

Reading 

Comprehension 

Arithmetic 


I 

II 

I 

II 

I 

II 

A 

70 

27 93 

-15 78 

96 

35 

23 82 

21 45 

B 

72 

3 67 

22 11 

1 21 

186 

14 72 

5 44 

C 

326 

-4 77 

33 25 

92 

2 06 

12 07 

6 36 

D 

133 

-6 60 

22 90 

82 

95 

17 04 

10 09 

E 

157 

9 29 

27 35 

148 

2 12 

10 65 

5 83 

F 

143 

-9 26 

41 48 

08 

2 36 

4 69 

5 38 


On the basis of our knowledge of the practice effect of test-- 
ing we should expect the first gains to be larger than the second 
gains unless the variations of experimental conditions materi- 
ally influenced the achievement of the pupils, which is extremely 
unlikely. We find in both reading rate and reading comprehen- 
sion that the first gains are less than the ones for the second 
period except for Group A. In three cases the first gam is 
negative. In arithmetic the first gam is larger than the second 
in all cases except one. The smaller gains during the first 
semester than during the second, and particularly the negative 
gains, are indicative of the presence of a constant error in at 
least one of the sets of scores from which the gains were com- 
puted. The gains in reading rate shown for Group A are also* 

1 Monroe, W S “The Illinois Examination,” University of Illinois Bulletin^ 
Vol 19, No 9, Bureau of Educational Research Bulletin, No. 6 Urbana Uni- 
versity of Illinois, 1921, pp 12-18. 
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interesting — ^from October to February there is a very marked 
increase m rate; for the second semester the gain is negative 
This suggests that the mean February score was too large, 
i e., it involved a positive constant error 

In another investigation^ conducted by the Bureau of Edu- 
cational Research the mean increases in mental age scores 
during a period of six months for two groups of children, each 
numbering about 3000, were found to be 4 years and .9 years. 
During the next six months for the same two groups the in- 
creases were 1.4 years and 1 0 years respectively. The normal 
increase in mental age during either of these intervals is, of 
course, six months The obtained increases for the first period 
might be expected to be somewhat greater because of the 
presence of a constant error introduced by the practice effect 
of testing. However, in one case the difference between the 
first- and second-trial scores is less than six months, and in 
both the increase is less than the corresponding differences 
between the second- and third-trial scores. The facts of this 
illustration become even more striking when we note that the 
total of the two gains for the first group is 1 8 years and that 
for the second 1 9 Thus, when the interval of twelve months 
is considered, the total increase in mental age score is approxi- 
mately the same for the two groups. On the other hand, if 
the two intervals of six months are taken, the increases in 
mental age scores are radically different for the two groups. 
Although our knowledge of mental growth is hmited, it does 
not appear possible to explain the inconsistencies noted except 
on the basis of a constant error in some of the sets of scores. 

In each of the two preceding illustrations we have evidence 
of the presence of a constant error in at least some of the groups 
of test scores, but the cause is obscure. Furthermore, the exact 
magnitude of the constant error is unknown. The obscurity 
of the cause is due in part to the large number of teachers and 

^ Odell, C W “The Use of Intelligence Tests as a Basis of School Organiza- 
tion and Instruction,” University of Illinois Bulletin, Vol 20, No 17, Bureau of 
Educational Research Bulletin, No 12 Urbana. University of Illinois, 1922. 
78 pp 
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pupils participating in each of these educational experiments. 
The errors may have been due to changes in the interest and 
attitude of the teachers and pupils toward the test. However, 
it was not possible to secure any direct evidence on this point 
The fact that the cause is obscure makes the possible presence 
of constant errors in such data a serious matter and tends to 
arouse suspicions regarding the accuracy of the measurements 
of achievement in large cooperative experiments 

In the illustrations cited m the preceding pages the tests used 
were objective, that is the test papers were scored by following 
specific instructions which eliminated the necessity for exercis- 
ing judgment In marking examination papers and other pupil 
performances where the scorer is asked to exercise judgment, 
much evidence has been collected to show that two persons are 
likely to differ widely in the scores they assign to the same 
pupil performances These differences are due in part to the 
presence of a systematic error resulting from the fact that one 
of the scorers tends to be more liberal than the other In an 
investigation^ several sets of pupil performances, for which the 
scoring was rather highly subjective, were rated independently 
by two persons under the supervision of a third A portion of 
one table from this report is reproduced as Table VIII to furnish 
evidence of the presence of a systematic error m the scores 
assigned by one or both of the scorers The entries in the 
column headed “difference of mean scores’’ were obtained by 
subtracting the mean of the scores assigned by the second scorer 
from the mean of those assigned by the first scorer Some of 
these differences are relatively large. It appears that a scorer 
is not always consistent with respect to his systematic error. 
Scorers Y and K show negative differences for two sets of papers 
and a positive difference for a third set In the same investi- 
gation, eighty-six compositions were rated independently by 
two persons using the Willing Scale for Measuring Written Com- 

1 Monroe, W S '‘A Critical Study of Certain Silent Reading Tests,” Uni- 
verstty of Illinois Bulletin, VoL 19, No 22, Bureau of Educational Research 
Bulletin, No 8 Urbana IFniversity of Illinois, 1922 52 pp 
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position The difference between the means of their scores was 
6.7 


Table VIII Subjectivity of Scoring Reproductions by tbdb 
Word-Counting Method 


Test 

Form 

Gra-db 

Number op 
Papers 

Scorers 

Difference 

OP Mean 

Scores 

Memory 

I 

IV 

27 

Y— K 

- 5-1 

Memory 

I 

VII 

123 

Y— K 

- 7.5 

Memory 

II 

VII 

31 

Y—K 

+ 4.1 

Memory 

II 

IV 

116 

Y— C 

- 2,0 

Memory 

I 

IV 

92 

Y—C 

- 99 

Memory 

II 

VII 

100 

Y— C 

- 8.2 

Reproduction 

I 

IV 

94 

L— K 

+ 6.8 

Reproduction 

II 

IV 

68 

L— K 

+ 47 

Reproduction 

II 

IV 

31 

L— C 

- 1.6 

Reproduction 

I 

VII 

117 

M—F 

- 05 

Reproduction 

II 

VII 

113 

F— C 

- 6.0 

Brown 

I 

IV 

111 

T— My 

+12 8 

Brown 

II 

IV 

no 

T— My 

+ 69 

Starch (No. 7) 

I 

VII 

119 1 

M— C 

- 5.8 

Starch (No 6) 

II 

VII 

121 

M— C 

- 2.0 


When a test has been standardized for age groups and the 
norms are used as a basis for translating the pomt scores into 
age scores, any systematic error m the norms will introduce a 
systematic error in the age scores A similar statement may be 
made relative to intelligence quotients Furthermore, the in- 
telligence level indicated by a given IQ, say one of 112, depends 
upon the test from which it was computed.^ 

The magnitude of constant errors in test scores. The preced- 
ing illustrations reveal a significant characteristic of the con- 
stant error in test scores They conform to no law. In the case 

Kefauver, G N “Need of Equating Intelligence Quotients Obtained from 
Group Tests,” Journal of Educational Research^ 19 92-101, February, 1929- 
See also 

Carroll, H A , and Hollmgworth, L S “The Systematic Error of Herring- 
Binet in Rating Gifted Children,” Journal of Educational Psychology, 21 1-11, 
January, 1930 

Cattell, Psyche “Why Otis' T.Q ' Cannot Be Equivalent to the Stanford- 
Binet IQ,” Journal of Educational Psychology, 22 599-603, November, 1931. 
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of a given test it is not possible to make a general determina- 
tion comparable to the probable error of measurement which 
will be applicable to all groups of scores obtained from admin- 
istering it. When first-trial scores are compared with second- 
trial scores, the latter are usually found to involve a positive 
constant error, but the magnitude of the error varies and it may 
be negative. For certain tests Thorndike’- has reported the effect 
of the practice due to the repetition of the test to be about 10 
per cent of the mean magnitude, but the relative magnitude 
may be expected to vary with the experience of the subjects, 
the type of test, and its structure Furthermore, practice is 
only one of several causes that contribute to the constant error 
in a group of test scores Hence, in a particular case, even an 
experienced investigator can only estimate roughly the magni- 
tude of the constant error in a group of test scores. This fact 
makes constant errors especially troublesome in educational 
research 

Systematic errors in other types of data. In making estimates 
of traits by means of rating scales, a phenomenon may occur 
which IS called the ^^halo effect For example, let us suppose 
that the rater recognizes that the individual being rated is 
outstanding in a given trait He may be, in the estimation of 
the rater, the most honest or dishonest person the rater has 
ever known. This high estimate or low estimate of one character- 
istic tends to cause systematic errors in the ratings of other 
traits of this individual. The rater is biased with respect to him 
Somewhat the same phenomenon has been observed when the 
individuals rated are well known by the rater. Friendship may 
tend tolDias the scores assigned and in some cases familiarity 
may breed contempt ^ 

1 Thorndike, E L. “Tests of Intelligence, Reliability, Significance, Suscepti- 
bility to Special Training and Adaptation to the General Nature of the Task,” 
School and Society, 9 189-95, February 15, 1919 

2 Thorndike, EL “A Constant Error m Psychological Ratings,” Journal of 
Applied Psychology, 4 25-29, March, 1920 

® See Knight, F, B “ The Effect of the ‘Acquaintance Factor’ upon Personal 
Judgments,” Journal of Educational Psychology, 14 129-42, March, 1923 

{Continued next page) 
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Persons who possess a given trait in a high degree tend to 
under-rate themselves, and persons who are very deficient with 
respect to the trait tend to over-rate themselves. Cogan, 
Conklin, and Hollingworth^ have reported that individuals 
tend to over-rate themselves on desirable traits and under- 
rate themselves on undesirable traits. Remmers^ has shown that 
distinguished students tend to under-rate themselves when 
their self-ratings are compared with ratings made by other 
persons On the other hand, the differences between self-rating 
of undistinguished students and ratings of them made by other 
persons are evidences of variable rather than systematic errors. 
The student interested in the errors of self-ratings should also 
consult the researches of Cattell,^ Hoffman,^ Hurlock,^ Shen,^ 
Trow and Pu,^ and the summary of the validity of self-ratings 
given by Symonds ® 

It is well known that systematic errors are a frequent limita- 
tion of questionnaire data The wording or arrangement of the 
questionnaire items may cause a systematic error. For example, 
Mathews has reported that respondents tend to give a higher 
per cent of afiirmative replies in a questionnaire of the multiple- 
response t3q)e to the items printed on the extreme left position.^ 

Shen, Eugene “ The Influence of Friendship upon Personal Ratings,’* Journal 
of Applied Psychology, 9 66-68, March, 1925 

See also discussion of rating in Chapter III on pages 51-55. 

1 Cogan, L C , Conklin, A M , and Hollmgworth, H L “An Experimental 
Study of Self-Analysis, Estimates of Associates, and the Results of Tests,” 
School and Society, 2 171-79, July 31, 1915 

2Remmers, H H “Distinguished Students — What They Are and Why,” 
Studies in Higher Education, 15, Bulletin of Purdue University, Vol 31, No 2. 
Lafayette, Indiana Purdue University, 1930 36 pp 

3 Cattell, J McK American Men of Science New York Science Press, 1927. 
1132 pp (First edition, 1906 ) 

^ Hoffman, G J “An Experiment in Self Estimation,” Journal of Abnormal 
and Social Psychology, 18 43-49, April, June, 1920 

6 Hurlock, E B “A Study of Self-Ratings by Children,” Journal of Applied 
Psychology, 11 490-502, December, 1927 

fi Shen, Eugene “The Validity of Self-Estimate,” Journal of Educational 
Psychology, 16 104-07, February, 1925 

^ Trow, W C , and ]Pu, AST “Self-Ratings and the Chinese,” School and 
Society, 26 213-16, August 13, 1927 

3 Symonds, P M Diagnosing Personalily and Conduct New York The Cen- 
tury Company, 1931, pp 109-11 

^ Mathews, C O. “The Effect of the Order of Printed Response Words on an 
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Systematic errors may occur in other types of educational 
research data In historical research the investigator may 
record information which supports a point that he is seeking to 
establish to the neglect of evidence in opposition. In preparing 
a critical summary of research the investigator may neglect, or 
criticize unduly, research which opposes his own point of view. 
Although such limitations of data are somewhat different from 
systematic errors m test scores, it seems justifiable to apply 
the term to them They represent a particularly vicious form of 
error since the investigator is unlikely to recognize their presence 

C Ekroks of Validity 

The cause of variable errors of validity. The general cause 
of errors of validity was given on page 129 as the use of substi- 
tute data or indirect measurement. Our understanding of this 
cause will be augmented by considering the types of substitu- 
tions that are commonly made Whenever the data collected 
are not precisely those that are specified by the label attached 
to them or to derived statistics or implied in the interpretation, 
a substitution is introduced When measures of human abilities 
and traits are specified or implied by the label, the following 
appear to be the more important types of substitution 

1. A measure of one achievement substituted for a measure of a 
different achievement This case includes the substitution of a 
measure of an ability functioning under certain conditions for a 
measure of what is called the same ability functioning under 
different conditions. 

2. A measure of a sample of the achievement substituted for a 
measure of the whole. 

^ 3. A measure of a combination of intelligence and acquired ability 
substituted for a measure of the results of instruction, usually 
restricted to certain instruction. 

4. A measure of immediate ability substituted for a measure of the 
residue of ability at a later date 

The first type of substitution is very general A test meas- 
ures directly the ability to respond to the exercises of the test 

Interest Questionnaire,” Journal of Educational Psychology, 20 128-34, Febru- 
ary, 1929 
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under the conditions of its administration. It is not easy to 
specify this achievement. Its nature is suggested by the con- 
tent of the test and the conditions attendmg its administration, 
but inconspicuous features may be mfluential in determining 
what IS measured directly The mfluence of subtle character- 
istics of test exercises is illustrated in studies of true-false tests. 
It has been shown that certain words and phrases are character- 
istic of many exercises for which the answer ^Talse’’ should be 
made. For example, statements containing “always” or 
“never” are false in two out of three cases ^ In fact, in one 
investigation^ they were found to be false in three out of four 
cases Other words and phrases tend to “determine” the re- 
sponse “true” Brinkemeier and Keys^ secured evidence indi- 
cating that a general characteristic called circumstantiality 
operates to suggest a response of “true.” It appears, therefore, 
that in the case of “test-wise” students a true-false test is 
likely to measure directly a combination of factual information 
and shrewd inference. Hence, when scores yielded by such tests 
are considered as measures of dynamic or usable knowledge, the 
possibility of relatively large variable errors of validity is created. 

Although we do not have similar critical studies for other 
types of tests, it is likely that the ability directly measured is 
determined in part by subtle characteristics of the measuring 
instrument employed. Hence, it is diiKcult, perhaps impossible, 
to determine with precision what a test measures directly. 
When there is a relatively large amount of writing, motor skill 
is likely to be involved ^ 

iWeidemann, C C “How to Construct the True-False Exammation,** 
Teachers College^ Columlna University Contributions to Education, No 225. 
New York Bureau of Publications, Teachers College, Columbia University, 
1926 118 pp 

2 Brinkmeier, I. H , and Ruch, G M “ Minor Studies on Objective Examina- 
tion Methods III Specific Determiners in True-False Statements,” Journal 
of Educational Research, 22 110-18, September, 1930 

® Brinkemeier, I. H , and Keys, Noel “ Circumstantiality as a Factor in 
Guessing on True-False Examinations,” Journal of Educational Psychology, 
21 681-94, December, 1930 

^ For an illustration of an attempt to correct obtained scores on a speed test for 
the effect of motor skills, see Courtis, S A , and Thorndike, E L “Correction 
Formulae for Addition Tests,” Teachers Collie Record, 21 1-24, January, 1920. 
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Usually a test measures only a relatively small segment of 
achievement. Hence, in addition to substituting measures of 
one achievement for measures of another achievement, a meas- 
ure of a small segment of achievement is substituted for a 
measure of a larger segment. Unless the sample is representa- 
tive of the whole in the case of the persons taking the test, the 
variable errors of validity are increased by this substitution. 
When the achievement to be measured is defined as the result 
of instruction, there is a further contribution to the variable 
errors of validity because a test usually measures directly a 
combination of intelligence and acquired ability and the relative 
proportion of these factors varies from person to person If the 
achievement to be measured is thought of as being relatively 
permanent, there is likely to be a still further contribution 
Causes of systematic errors of validity in test scores. When a 
substitution of data is made, it is not necessary that the mean 
of the substitute measures be numerically equal to the mean 
of the specified measures. In fact, in the case of test scores 
little effort has been made to establish a standard umt of meas- 
urement without which numerical equivalence could not be 
expected. Hence, when a researcher is concerned with only one 
set of data, it is not necessary to give attention to the question 
of the presence of a systematic error of validity. It is only when 
norms are introduced or two or more sets of measures are being 
compared that this question requires consideration The sub- 
stitution will not introduce a systematic error of validity if the 
ratio of the mean of substituted measures to the mean of the 
specified measures is the same for all groups of measures For 
example, the substitution of scores on a mere information test 
for measures of the total achievement in such a subject as 
physics will not introduce a systematic error of validity if the 
ratio of the mean of the information test scores to the mean of 
measures of the total achievement is the same for the several 
populations. Hence, the causes of systematic errors of validity 
are to be sought in the conditions that produce fluctuations in 
the ratio of these means 
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Systematic errors of measurement will affect this ratio, but it 
is also affected by variations in the relative magnitude of what 
the test actually measures and of the ability or trait specified. 
For example, under a given curriculum and plan of instruction 
calculation achievement m arithmetic bears a certain ratio to 
problem-solving achievement Under these conditions the mean 
score on a calculation test may be considered a vahd measure 
of the average problem-solving achievement If the curriculum 
or general plan of instruction is modified with the result that 
this ratio is changed, the mean score resulting from a second 
administration of the calculation test will involve a constant 
error of validity when used as a comparable measure of the 
average problem-solving ability. If the curriculum or plan of 
instruction is different for two groups, the mean score of one 
group will probably involve a constant error of validity. As 
another illustration, suppose an information test is being used 
to measure the total achievement in such a subject as English 
literature The relation existing between the mean score on the 
information test and the average total achievement on one date 
may be changed as the result of the instructor’s emphasis upon 
information objectives and the efforts of the students to become 
able to respond to information tests. If this happens, a signifi- 
cant systematic error of validity will be introduced in the scores 
made on such a test at a later date This systematic error will 
be m addition to that designated as practice effect 

The repeated administration of a particular type of test will 
tend to cause pupils to make the ability to respond to such tests 
their objective^ and hence is likely to affect the ratio of what is 
measured directly to other achievement This ratio is likely to 
be affected also by the amount and recency of instruction relat- 
ing to the topics covered by the test. Hence, in considering the 
possibility of systematic errors of validity, it is necessary to 
note not only the curriculum and general plan of instruction 
but also less obvious conditions such as those just mentioned. 

^ Meyer, George “An Experimental Study of the Old and New Types of 
Examination,” Journal of Educational Psychology^ 26 30-40, January, 1935. 
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The measurement of variable errors of validity in test scores. 
If criterion measures are available,^ the coefficient of correlation 
between them and the scores yielded by the test (ri^) is an index 
of the total variable error in the test scores and the variable 
error in the criterion. This coefficient is equal to the product of 
the coefficient of correlation between true scores and true cri- 
terion measures and the square roots of their reliabihty coeffi- 
cients ^ 

Tic = ra^oocco Vr^Vrco 

The true coefficient of validity, is given by the formula® 


'^XooCaa 


Tic 


Although Tic is commonly referred to as the coefficient of validity, 
it actually is the coefficient of reliability and validity ^ If the 
criterion is fallible, that is, involves variable errors of meas- 
urement, Tic indicates a degree of validity that is too low In 
such cases the following formula should be used in deternnnmg 
the coefficient of validity. 

Jji. 


The coefficient of validity, Tu or tic^, is unsatisfactory as a 
measure of the variable errors of validity for the reasons given 
in connection with the coefficient of reliability By an argument 
similar to that on pages 132-33, it is possible to derive a formula 
for the median deviation (probable error) of the differences 
— Xc^ but since criterion measures of known reliability are 
seldom obtainable, the formula does not have much application 
See Chapter XI for an mterpretation of the coefficient of correla- 
tion between the scores yielded by a test and criterion measures 

1 Satisfactory criterion measures are usually difficult or impossible to obtain 
See Chapter VII, pages 207-08 
^ See page 209 
® See page 161 

* Validity IS sometimes defined as inclusive of reliability. When this definition 
18 accepted, this distinction is not justified. 
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The magnitude of systematic errors of validify. A coefficient 
of validity does not reveal whether a systematic error of va- 
lidity is involved It is a measure of the variable errors of 
validity In fact, the term “vahdity,” as it is commonly used, 
does not refer to systematic errors There are no definite tech- 
niques for determining the magnitude of systematic errors of 
validity and estimates of experienced persons are not hkely to 
be very dependable About all that can be done is to indicate 
cases in which a systematic error is probable and to hmit in- 
terpretations accordingly ^ 

D Effect op Data Faults 

The effect of errors upon statistics.^ Many persons appear 
to believe that the effect of errors in the data upon the mean, 
median, standard deviation, coefficient of correlation, and other 
statistics becomes negligible when the number of cases is suf- 
ficiently large This is only partially true A constant error in 
the original data makes the mean in error by the same amount 
and any increase in the number of cases does not decrease the 
magnitude of this effect, provided the constant error remains 
the same as the number of cases is increased ^ The same situa- 
tion prevails for the median On the other hand, a constant 
error does not affect the standard deviation and other meas- 
ures of variability. Neither does it affect the coefficient of 
correlation Systematic errors which are proportional to the 
size of the score do not affect the coefficient of correlation, but 
other t3rpes of systematic errors may affect it. 

The variable errors in an unselected group of data tend to 
form a normal distribution whose mean is zero. The median 

1 For further consideration of the magnitude of systematic errors of validity, 
see Chapter VIII, page 149 

^ A different type of treatment of this topic for certain statistics is given by 
Bowley, H L Elements of Statistics^ Third Edition New York Charles 
Scribner’s Sons, 1917, pp 203-14 

3 Under certain conditions, systematic errors tend to become variable errors 
as the number of cases is increased This probably occurs in a large coopera- 
tive investigation in which tests are administered by a number of different per- 
sons 
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deviation of this distribution, properly called the probable error, 
may be calculated as indicated by the formula ^ 

PEi^=^ 6745criVr=l^ 

In this formula vn is the coefficient of reliability and ai is the 
standard deviation of the obtained measures ^ 

The effect of variable errors upon the mean cannot be de- 
termined It is possible only to determine the probable limits 
of the effect for a given probability. The formula is 

TDijT _ .6745criVl -ru 

■tMii oom ;= 

VA 

Since the square root of N appears in the denominator, we 
may say that the effect of variable errors upon the mean varies 
inversely as the square root of the number of cases This re- 
lationship is doubtless the basis of the belief that the effect 
of errors in the data tends to become negligible when the 
number of cases is large 

The probable error of measurement of the difference between 
two means is given by 

PEi • ooD — 1 ooilfi "4" PE^ooMt ^^12 PEi»coMiPE 2 coMz 

If the two sets of measures are uncorrelated, the product term 
becomes zero. 

The probable error of measurement of individual gains (differ- 
ences between scores on comparable forms of a test) is given by ® 

PEhooG = .6745criV2 ^ Tix — nn 

1 For the standard error of the probable error of measurement, see Kellogg, 
C E, and Spence, K W ‘‘Note on the Standard Errors of Estimate and 
Measurement,” Journal of Educational Psychology, 22 313-15, April, 1931 

2 If this standard deviation differs significantly from the standard deviations 
used in calculating the coefficient of reliability, then the coefficient of reliability 
should be corrected by means of Kelley’s formula for the relation between ranges 
in obtained scores and reliability coefficients See pages 110-11 or Kelley, T L 
Statistical Method New York The Macmillan Company, 1923, p 222 

® For derivation of formula see 

Kelley, T L “A New Method for Determining the Significance of Dif- 
ferences in Intelligence and Achievement Scores,” Journal of Educational 
Psychology, 14 321-33, September, 1923 

Holzmger, K J “Note on Professor Kelley’s Formula for Determining the 
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Variable errors tend to make the standard deviation and 
other measures of variability larger than they would be other- 
wise. The relation between the obtained standard deviation 
and the true standard deviation is given by the following 
formula: 

o-oo = cTiVn^ 

If the coeflBlcient of reliability is known, this formula provides 
a means for correcting the calculated value of <r for the effect 
of variable errors Since N does not appear in the formula, it 
follows that increasing the number of cases does not reduce the 
effect of the variable errors of measurement upon the standard 
deviation 

The presence of variable errors in the data tends to decrease 
the coefficient of correlation, an effect known as “attenuation.^^ 
Several formulae ^ have been employed to correct for the effect 
of variable errors of measurement The following one con- 
tributed by Spearman ^ is frequently used: 


rooco — 


ri2 

VruVr2jj 


The use of the formula may be illustrated as follows: 
rn = .59 _ 59 

ru = .85 “ VSSVSS 

r2jj = .83 roooj = .70 


The derivation of this formula assumes that the variable 
errors in the two sets of measures are uncorrelated with each 

Significance of Differences,” Journal of Educational Psychology^ 16: 48-51, 
January, 1925 

Kelley, T L “Professor Kelley’s Reply,” Journal of Educational Psychology, 
16 52—55, January, 1925 

^ For a list of the formulae used in correcting for attenuation, see 
Dunlap, J. W , and Kurtz, A K Handbook of Statistical Nomographs, Tables, 
and Formulas Yonkers-on-Hudson World Book Company, 1932, p 127 

2 Spearman, C “ The Proof and Measurement of Association between Two 
Things,” American Journal of Psychology, 15 90, January, 1904 

Spearman, C “Demonstration of Formulae for True Measurement of Cor- 
relation,” American Journal 6f Psychology, 18 161, 1907 

Spearman, C “Correlation Calculated from Faulty Data,” British Journal 
of Psychology, 3 271-95, 1910 
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other and with the true measures with which they are com- 
bined. Brown and Thomson ^ have pointed out that these 
assumptions are frequently not satisfied Hence, the corrected 
coefficient obtained should not be thought of as a highly precise 
determination. Correction for attenuation by means of the 
above formula may be regarded as yielding a best estimate of 
the coefficient which would be obtained if we could calculate 
it from true scores 

If the coefiicient of reliability is known, it is possible to cal- 
culate estimated true measures commonly called regressed 
measures If is used to designate the estimated true meas- 
ures, the formula ^ is 

- rxjXi + (1 - ru)M^ 

The regressed measures are only estimates of the true measures 
They involve variable errors, but they are smaller than in the 
original measures Their probable error is given by the ex- 
pression 67450*1 Vru — rfj instead of 67450*1 Vl — ru 

The preceding discussion has dealt with the effect of errors 
of measurement. The general statements relative to the effect 
of variable errors and the effect of constant or systematic errors 
are also applicable to errors of validity. In the absence of 
appropriate criterion measures the effect of variable errors of 
validity can only be estimated It should be noted, however, 
that the errors to be considered in a given case depend upon the 
label given the calculated value or implied in its interpretation 
For example, teachers’ marks are known to be fallible measures 
of achievement, but if a coefficient of correlation calculated 
from such data is interpreted in terms of the relationship be- 
tween two sets of marks, the only errors to be considered are 
the accidental ones that may have occurred in copying the 
marks from the school records or in handling them. If, however, 

1 Brown, William, and Thomson, G H The Essentials of Mental Measure- 
merU Cambridge University Press, 1921, p 158 

2 This IS essentially the regression equation connecting Xi and Xj It is as- 
sumed that Ml ^ Mx and ci == <rj See Kelley, T L Interpretation of Educa'- 
tional Measurements Yonkers-on-Hudson World Book Company, 1927, p 178. 
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the interpretation is to be made m terms of the relationship 
between the achievements actually represented by the marks, 
their reliability must be considered, and if the interpretation 
is in terms of the relationship between specified achievements, 
both reliability and validity must be considered. 

Recognizing and making allowances for non-agreement with 
assumptions. At several places in Chapter IV attention was 
called to an assumption made in the derivation of a formula or 
to one introduced in interpretmg the statistic. When the mean 
or standard deviation is calculated from a frequency distribu- 
tion, the assumption is made that the measures within an 
interval are distributed so that the mid-points of the intervals 
may be taken as their average value In the calculation of the 
mean, the positive deviations from this assumption will fre- 
quently be approximately equal to the negative ones and hence 
the mean will be unaffected Sometimes, how^ever, non- 
agreement with this assumption may have a material effect 
upon the calculated value of the mean In calculating the 
standard deviation, the deviations from the mean are squared 
and hence there is no neutralization of the effects of non- 
agreement with the assumption. In frequency distributions 
that are normal or approximate the normal, the tendency is 
for the actual mean of the measures m an interval to fall nearer 
the mean of the entire distribution than the mid-point of the 
interval. When the grouping is coarse, i e., when the interval 
is relatively large, the error thus introduced is important enough 
to require attention if precise results are desired Sheppard’s 
correction formula ^ for the standard deviation is 

Cf corrected 

1 Sheppard, W F “On the Calculation of the Average Square, Cube, etc , of 
a Large Number of Magnitudes,” Journal of the Royal Statishcal Society, 6 698- 
703, 1897 

Sheppard, W. F “On the Calculation of the Most Probable Values of 
Frequency Constants for Data Arranged According to Equidistant Divisions 
of a Scale,” Proceedings of the London Mathematical Society, 29 353—80, 
1898 

Pearson, Karl, et al “On the Elementary Proof of Sheppard’s Formulae for 
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The symbol i designates the number of units in the class in- 
terval 

Since the standard deviation is involved in the calculation 
of the coefficient of correlation, coarse grouping also affects the 
value of r obtained Sheppard^s correction for a coefficient of 
correlation is given by the following formula ^ 


ri2 


^XiX2 


'corrected 


4 ) 


When the unevenness of the distribution of the measures 
within an interval is due to other causes, there is no general 
procedure for making allowance for non-agreement with the 
assumption of uniform distribution If non-agreement with 
this assumption is suspected, distributions may be formed for 
two or more different choices of interval points. If the cal- 
culations from the different distributions give essentially the 
same results, there is probably satisfactory agreement with the 
assumption If the results differ, ^ the investigator faces the 
problem of determining which result is most nearly correct 
In particular cases, an alert and resourceful investigator may 
be able to modify the plan of calculation so as to secure a 
more correct result. For example, in handling distributions of 
teachers^ salaries the points of concentration may be noted and 
the calculations modified accordingly. 

Linearity of relationship is assumed in computing the product- 
moment coefficient of correlation. Identification of non- 
conformity with this assumption is accomplished by means of 
the Blakeman test described on page 102. When the relation- 
ship is not sufficiently linear, the correlation ratio (rj) should be 
used. 


Correcting Raw Moments, and on Other Allied Points,” Btometnka, 3 308-12, 
1904. 

1 In this formula and the preceding one the o-’s are expressed m terms of the 
scale of measurement If they are in terms of intervals, ti and ^2 become unity 

2 For an illustration in the case of the coefl&cient of correlation, see page 102 
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The calculation of other statistics or their interpretation 
frequently implies one or more assumptions. The reader in- 
terested m this topic may locate the discussion of these assump- 
tions by referring to the Topical Index of this volume Some 
of the more important references are: coefficient of reliability 
calculated by means of the Spearman-Brown formula, page 202; 
calculation of coefficient of reliability for a greater or less range 
of talent, page 110; probable error of a difference, pages 150 
and 308 f ; partial correlation, pages 380 f 

Making allowances for non-representativeness of data. 
There are two general types of non-representativeness: (1) that 
caused by chance in random sampling, and (2) that due to 
systematic influence. The effect of non-representativeness of 
the second type can be calculated or estimated only when 
supplementary data are available and for such cases there is 
no general technique ^ The effect of chance in random sampling 
was dealt with in the preceding chapter under the heading of 
^^The probable limits of the value of a statistic for a universe 
when calculations have been made from a random sample,^’ and 
hence the discussion here will be limited to certain supple- 
mentary points 

The designation of a group of data as a sample means that 
they have been taken from a larger population or universe. 
In the ease of some types of data it is possible to conceive of 
two universes In the case of test scores one consists of the 
scores resulting from an infinite number of administrations 
of the test to a given group of pupils under the same testing 
conditions Repeated administration of a test under the same 
conditions is, of course, not possible, but such a universe may 
be thought of, and the scores obtained from a single admin- 
istration under standard conditions may be taken as a ran- 
dom sample of it. The other type of universe involves an un- 
hmited population of pupils. The probable error formulae 
given on pages 104-05 are for this type of universe and they are 

^ Certain techniques for use in connection with the coeflSicient of correlation 
have been described m Chapter IV, pages 110-11. 
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applicable only when the data qualify as a random sample of 
it. A measure of the effect of using a sample from a universe 
of the first type is obtained by means of the probable error 
of measurement It may be noted that successive samples 
from the universe of performances are not independent but 
correlated. This condition is shown by coejB&cients of reha- 
bility. 

The fact that we may conceive of a universe of performances, 
as well as a universe of subjects, has resulted in considerable 
confusion. Any group of test scores may be thought of as a 
sample of the hypothetical universe of performances by the 
subjects tested. But a group of scores is not necessarily a 
random sample of a specified universe of subjects and some 
persons have unconsciously shifted in their thinking from a 
universe of performances to a universe of subjects. A clear 
distinction should be made between sampling errors (effect of 
chance in random sampling from a universe of subjects) and 
variable errors of measurement (effect of sampling a universe 
of performances) 

The effect of chance in random sampling is just as likely to 
make the calculated value of a statistic larger than the value 
for the universe as it is to make it smaller Hence, it is possible 
only to calculate the probable limits of the value of the statistic 
for the universe. For this purpose the probable error formulae 
given in Chapter IV, pages 104-05, are commonly employed. 
It should be noted, however, that the formulae for the probable 
error of a mean, median, and standard deviation give the 
probable effect of chance plus the probable effect of variable 
errors of measurement.^ If the data are fallible and it is desired 
to obtain the probable error of a statistic due to chance m 
random sampling alone, the following formulae must be used.^ 

1 The coefScient of correlation may be corrected for the effect of variable 
errors and when such correction has been made, the probable error formula 
depends upon the attenuation formula used See Kelley, T L Statistical 
Method. New York The Macmillan Company, 1923, pp 210 f 

-Kelley, T. L “Note upon Holzinger’s Formula for the Probable Error,” 
Journal of Educational Psychology, 14 376”77, September, 1923 

Huffaker, C L , and Douglass, HR “On the Standard Errors of the Mean 
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PEm, - 674:5— Vru 
^IN 

PEuds = 8454-^ Vru 
VA 

PEa, == 4769 ^ 

■yJN 

Since the effect of variable errors of measurement must also 
be considered, the above formulae are seldom used 
Another point to be noted is that the probable error of a 
statistic due to sampling has no connection with systematic 
errors of measurement, variable errors of validity, or systematic 
errors of validity. Hence, the probable error of a statistic due 
to the use of a random sample cannot be used as an index of 
the effects of these other faults of the data. In other words, 
when a statement is made of the probability that the value of 
the statistic for the universe lies within certain limits, there 
should be added the qualifying phrase ^^disregardmg the effects 
of systematic errors of measurement, variable errors of vahdity, 
and systematic errors of validity.^’ 

A concluding statement. The iBbrst sentence of this chapter 
stated that the calculated value of a statistic may not be the 
correct value, even when no arithmetical errors have been made 
in the process, and the term “dependability” was introduced 
to designate the degree of agreement between the calculated 
value and the correct value of the statistic as labeled. The 
discussion of the chapter has dealt with the nature and effects 
of the various types of data faults, and it should be apparent 
to the reader that when the precise nature of the label is con- 
sidered the value to which it is attached is very frequently not 
correct. In some cases the magmtude of the error is relatively 
large 

The facetious classification of statistics as representing the 
maximum degree of falsehood and deceit is indicative of the 

Due to Sampling and to Measurement,” Journal of Educational Psychology^ 
19 643-49, December, 1928 
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subtle and insidious nature of the causes contributing to unde- 
pendability An untrained investigator not infrequently makes 
unjustified interpretations because he is not aware of the un- 
dependabihty of his findings; and a person mterested in making 
a case is likely to “ discover findings that appear to support 
his beliefs or wishes There are no definite techniques for 
identifying the presence of systematic errors of either measure- 
ment or validity which are important data faults in the case 
of central tendencies or differences between central tendencies 
The effect of variable errors of measurement upon the standard 
deviation or upon the coefficient of correlation may be elim- 
inated if coefficients of reliability are known for the population 
represented by the data. There is no technique for correcting 
for the effect of variable errors of validity. The calculation of 
the probable or standard error due to sampling tends to give 
the impression of demonstrating the dependability of the 
findings, but it is obvious that this is not true Hence, a person 
who has no intention of doing so may deceive himself or his 
audience in regard to the dependability of his findings Many 
of the reported findings are not correct when the explicit or 
implied label is considered, and hence one who reads reports 
of studies should attempt to estimate the degree of depend- 
ability of the reported findings. This, however, is not easy to 
do. A critic is in a less favorable position than the one who 
conducted the research and unless a reader is supplied with ade- 
quate information, he can only estimate the dependability 
upon the basis of his general experience with data of the type 
involved. 



CHAPTER VI 


STUDYING THE PAST IN EDUCATION 

Historical research ia education. Until about 1910 history of 
education was a popular field of inquiry and although during 
recent years it has been overshadowed by the quantity pro- 
duction of educational research in other fields, studying the 
past in education still commands the attention of a number 
of persons Investigatmg the past in education is a fascinating 
activity. To reveal a hitherto unknown origin of an idea or 
practice contemporarily regarded as new is as satisfying to the 
historical research worker as the discovery of a new organic 
compound to the research chemist. To show that a goal formu- 
lated for the education of the present is sanctioned by centuries- 
old concepts of educational values is both satisfying and prac- 
tical. An aim or practice that is old is not necessarily worthy of 
continued acceptance or use, but the test of years of thought 
and trial cannot be ignored. Historical studies frequently con- 
tribute to the understanding of present institutions and prac- 
tices Historical findings may be as significant as those resulting 
from surveys and experiments It is probable that many current 
absurdities in education would not be extant if the educators 
to blame for them had heeded the lessons of the past. Further- 
more, historical research may be a source of inspiration to those 
engaged in educational activities. 

Historical problems in the field of education. The reader 
who is not acquainted with this field of research can gain some 
idea of the problems studied and the scope of the field by noting 
the titles of the references in the illustrative bibliography at 
the end of the chapter Although no precise classification is 
possible, certain types of problems may be mentioned. Some 
studies are restricted to a historical account of a particular per- 
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son or particular educational institution A problem of this type 
is perhaps the simplest and the success of the investigator de- 
pends primarily upon locating sufScieiit authentic sources. The 
report is largely a statement of the facts discovered in exam- 
ining the sources A second type of problem calls for tracing 
the development of education or a phase of it within a specified 
geographical area If this area is large and the development 
varied in different sections, the investigator is likely to en- 
counter difficulty in generalizing It is not easy to generalize 
in surveys of present practices over a wide area It is more 
difficult to do so in a historical study A third type of problem 
calls for tracing the origin or development of a movement ^ 
The inadequacy of available records frequently makes the de- 
termination of origins difficult and the ramifications of a move- 
ment are not easy to identify 

Historical problems in the field of education are more varied 
than these three types may indicate, but they are probably 
sufficient to suggest the general natme of historical research 
in this field A problem of the first type does not call for ex- 
tensive training After adequate sources are located, the prin- 
cipal requirement is systematic and careful work For the other 
types of problems, a background acquaintance with educational 

1 Modern writers on historical method appear to be divided on the question 
of whether or not the historical research worker should attempt to show cause 
and effect relationships For example, Teggart and Croce restrict the task of 
the historical research worker to that of revealing the characteristics of events 
of the past in their sequential order, while Bernheim, Fling, Langlois, and 
Seignobos, and Vincent sanction the making of inferences with respect to cause 
and effect relationships 

Teggart, F J Theory of History. New Haven Yale University Press, 1925, 

pp 61-68 

Croee, Benedetto History, Its Theory and Practice New York Harcourt, 
Brace and Company, 1921, pp 64-82 

Bernheim, Ernst Lehrhuch der Histonschen Methode Leipzig Verlag von 
Duncker und Humblot, 1894, pp 492-522 (First edition, 1889 ) 

Fling, F M The Writing of History. New Haven Yale University Press, 
1920, pp 146-50. 

Langlois, Ch. V., et Seignobos, Ch Introduction aux J&tudes Historiques Pans, 
1898 (English translation by G G Berry, Henry Holt and Company, 1925, 
pp 285-95) 

Vincent, J M. Historical Research New York Peter Smith, 1929, pp 261-76. 
(Reprinted from the 1911 edition of Henry Holt and Company.) 
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history is essential Much interpretation of data is involved 
and a person who does not have an adequate background is 
likely to make many errors. He will also be handicapped in 
locating sources 

The student contemplating a historical study as a thesis 
should be cautioned against a hasty selection of a problem. 
Accessibility of adequate sources is necessary. Hence, it is 
unwise for a student to commit himself to a problem until 
examination of the accessible source materials indicates with 
reasonable certainty that with sustained effort and painstaking 
work he will ultimately produce a satisfactory report In the 
general discussion of the defimtion of research problems in 
Chapter II, the necessity of a clearly defined problem as a guide 
for collecting data was emphasized The definition of a problem 
in historical research is not so essential, but it should be suffi- 
ciently limited so that intensive study of it is possible A de- 
fined problem will aid one in locating sources and in most cases 
it is necessary as a guide in selecting data from sources. A 
well defined problem facilitates the organization of the report 
and most readers desire a definition as a means of orientating 
their study of it 

The procedures of historical research.^ The four phases of 
educational research — defining the problem, collecting data, 
handling them, and interpreting the findings — appear in his- 
torical inquiry but obviously in a somewhat specialized form. 
In the case of problems relating to the remote past, the sources 
of information are frequently limited and the problem must be 
defined to fit the available data. When the sources are more 
ample, definition of the problem takes the form of limiting the 
scope of the proposed study Frequently it is not possible to 
define the problem by analyzing it mto a series of subordinate 
questions, at least not until a preliminary survey of the avail- 
able sources has been made Collecting data is a conspicuous 

^ For a more extended account of this topic see Good, H. G. “Historical Re- 
search in Education,” Educational Research Bvlletirif 9 7-18, 39-47, 74-78, 
January 8, January 22, February 5, 1930 
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phase of historieal research and will be treated in the following 
pages. Handling the data is accomplished by organizing them 
for effective presentation. In interpreting the data it is neces- 
sary to give attention to their accuracy and validity. 

The sources of historical educational research data. In 
historical research a distinction is made between primary and 
secondary sources of data The former comprise remains or 
relics and written documents which have survived from the 
period being studied and which represent or contain first-hand 
information relevant to the problem of the investigation. 
Written documents are of many kinds. Charters, town records, 
court records, constitutions, records of legislation, and reports 
of public officers may be classified as official documents Un- 
official documents include newspapers, magazines, letters, 
diaries, autobiographies, and chronicles College and university 
catalogs and registers, and courses of study and syllabi of other 
public and private schools possibly constitute semi-official 
soul CCS since they lack the legal and governmental character- 
istics of truly official documents and are more impersonal than 
documents coriectly termed unofficial Textbooks of the past, 
a very valuable source of historical educational research data, 
may be classified as relics The title page and the preface may 
warrant classification as a document. 

Secondary sources are accounts of the past written by persons 
who had access either directly, or indirectly, to primary or 
original sources. Typical secondary sources are reports of 
previous studies in the field of the problem and texts on the 
history of education. It may be contended that in order for an 
investigation to qualify as historical research some use must 
be made of primary sources. If the investigator does not collect 
at least a portion of his data from primary sources he cannot 
make an ^^originaU' contribution. Secondary sources may be 
useful in making evaluations of the appropriateness, validity, 
accuracy, and adequacy of the data secured from primary 
sources and in making inferences with respect to cause and 
effect relationships, where the data obtained from primary 
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sources by the investigator are insufficient for this pur- 
pose 

Locating sources and copying data from them. Many of the 
sources m a library may be located by referrmg to the card 
catalog, but if access to the stacks is permitted, it is advan- 
tageous to examine the volumes in the sections relating to the 
problem Frequently, the investigator must seek sources out- 
side the library of his own institution Many of the larger 
libraries have special collections of historical documents and the 
investigator should seek information concerning those that re- 
late to his problem. A graduate student will usually be able to 
secure this information through his adviser, but it may be 
necessary to make direct inquiry. For some problems, records 
may be sought in public offices and valuable source material, 
especially correspondence, may be located m the possession of 
private individuals Sometimes valuable material may be 
located m second-hand bookstores and even in unexpected 
places As the investigator reads the sources that he has located, 
he should be alert m noting references to additional sources. 
The quality of historical research is conditioned by the ade- 
quacy of original sources examined and, hence, a study should 
not be undertaken unless it appears feasible to obtam access to 
adequate sources. 

In copying data from sources, the procedure is that of re- 
cording a reference to the source and the items of information 
relevant to the problem A technique relative to this procedure 
recommended by historians is that of recording each separate 
item of information on a separate card or sheet with a citation 
to its source. For convenience the sources may be numbered 
and the citation may be made by writmg only the number of 
the source and the page from which the item is taken The 
arrangement of the various items of information on separate 
cards or sheets greatly facilitates their organization ^ Checking 

^ The following reference may be consulted for further details of note tak- 
ing, 

Dow, E W Principles of a Note System for Historical Studies, New York* 
Century Company, 1924 124 pp. 
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all items with respect to accuracy m copying is an obviously 
essential procedure In selecting the items for copying, the 
investigator should be guided by his problem This is especially 
important in the case of problems relating to the comparatively 
recent past for which extensive source materials are available 
The investigator should also attempt to determine the accuracy 
and validity of the items copied 

Organizing historical data. The organization of historical 
data vanes with the type of problem Sometimes a chronolog- 
ical organization is desirable In other cases the organization is 
about certain sub-topics. The purpose is to bring together 
related facts so that they may be effectively presented This 
requires the formulation of some plan which should be made 
apparent to the reader. A mere recital of factual statements 
makes monotonous reading and a reader^s estimate of a report 
is likely to be influenced by the author^s organization of his 
material. 

Determining the accuracy and validity of historical data.^ 

Dates and other factual statements found m sources may not 
be accurate The techniques described m Chapter V for identify- 
ing errors of measurement are not applicable to historical data, 
but the historical investigator should endeavor to determine 
the dependability of his sources and consequently the accuracy 


1 For further discussion of the techniques of historical criticism, the following 
references may be consulted 
Bornheim, op c%t , pp. 236-438 
Fling, op cit , pp 48-125 

Freeman, E A Methods of B%stoncal Study London Macmillan and Com- 
pany, 1886 335 pp 

George, H B Historical Evidence Oxford Clarendon Press, 1909 223 pp 
Good, H G “Historical Research in Education,” Educational Research 
Bulletin (Ohio State University), 9.7-18, 39-47, 74-78, January and February, 
1930 

Johnson, Allen. The Historian and Historical Evidence New York Charles 
Scribner’s Sons, 1926 179 pp 
Langlois et Seignobos, op cit , pp 71-208 

Marshall, R. L The Historical Criticism of Documents New York The 
Macmillan Company, 1920 62 pp 

Seignobos, Ch La MSthode Histonque Apphquiie aux Sciences Socmles, 
Pans Felix Alcan, 1909, pp 29-93. (Second edition ) 

Vincent, op cit.^ pp 19-260. 
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of the data obtained from them As a means of accomplishing 
this, he should inquire concerning the author of the document, 
the extent to which he was competent, unbiased, and in a posi- 
tion to report accurately the events described, where and when 
the document was written, and what other investigators may 
have discovered concerning its dependability. When compara- 
ble sources exist, they should be consulted in this connection. 
A source may be examined with reference to internal con- 
sistency. Contradictory or inconsistent statements within a 
source are indicative of a lack of dependability. 

Records, catalogs, newspapers, diaries, personal letters, and 
other writings of the period may usually be accepted as de- 
pendable sources, but data taken from them should be correctly 
labeled For example, if the factual statements relate to legis- 
lative enactments, they should not be used uncritically as de- 
scriptive of educational practice Legislation pertaining to 
education sometimes legalizes practices long current. In other 
cases a law is enacted but observed to only a limited extent. 
This is illustrated by the Massachusetts Law of 1642 It di- 
rected officials of the towns to ascertain from time to time 
whether or not parents and masters of apprentices were ful- 
filling their educational duties The Law of 1647 ordered the 
establishment of schools, indicating that the Law of 1642 was 
not functioning effectively Hence, an inference from the Law 
of 1642 that parents and masters universally fulfilled their 
educational duties would be unjustified. Neither should one 
infer from the Law of 1647 that schools had not been established 
prior to 1647 Other sources indicate that a number of schools 
were m operation for years prior to that date. 

Newspaper advertisements are an important source of data 
respecting the private schools of colonial days.^ One can be 

1 See Seybolt, R F “The Evening School in Colonial America,” Universtty 
of Illinois Bulletin, Vol 22, No 31, Bureau of Educational Research Bulletin, 
No 24 Urbana University of Illinois, 1925 68 pp 

Seybolt, R F “ Source Studies in American Colonial Education The Private 
School,” University of Illinois Bulletin, Vol 23, No 4, Bureau of Educational 
Research Bulletin, No 28 Urbana Umversity of Illinois, 1925. 109 pp. 
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certain from these advertisements what courses were offered by 
the private schoolmasters. One can tnfer the courses taught 
with reasonable justification, but not with certainty since the 
advertisements may not have stimulated enrollment m all of 
the courses listed Numerous independent advertisements of a 
given subject, however, would appear to justify with practical 
certainty that the subject was taught at the times indicated by 
the dates of the newspapers. The schoolmasters would not 
have been likely to continue to advertise had there not been a 
demand for the courses offered 

The preceding paragraph illustrates an important principle 
in historical research. The occurrence of an event is established 
by the agreement of reports of independent and competent 
witnesses. The appearance of advertisements relating to a 
given course in newspapers of different towns at approximately 
the same time would seem to establish that the course was 
taught at that time. 

Although official records are generally considered dependable, 
the critical worker will be alert to detect evidence of incon- 
sistency or error This is especially desirable in the case of 
school records Our present methods of child and financial 
accounting are a recent achievement, and it has been found 
that comparatively recent school records were, in some cases, 
incomplete or misleading Reminiscences and accounts of 
events within the lifetime of the author, but written sometime 
after their occurrence, are usually less dependable than diaries 
or personal letters The author’s memory is not likely to be 
complete and accurate, at least in regard to dates and details 
of events He is also likely to interpret happenings of the past 
in the light of subsequent events Official reports may involve 
errors Occasionally the error may be due to intentional falsi- 
fication, but more frequently the cause is inadequate mforma- 
tion for the period covered by the report. Information con- 
cerning educational practices and conditions within a state or 
larger area is usually gathered by correspondence and it is 
difficult, even today, to secure a response of 100 per cent to an 
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official request Furthermore, sometimes information furnished 
the compiler is not accurate or complete 

When the data are generalizations found in writmgs of the 
period, it must be remembered that until comparatively re- 
cently society was highly provincial and consequently that only 
a very few writers possessed accurate knowledge concerning 
educational conditions over a large area Even when the gen- 
eralization is restricted to a limited area, it was not usually 
based upon systematically collected data. Furthermore, it is 
important to know whether the writer was influenced by any 
bias or prejudice 

The dependability of historical research. By being system- 
atic and by checking the copied items, a historical investi- 
gator can insure the accuracy of the items as judged by 
the sources from which they were obtained, but evaluation 
of their historical accuracy and validity is likely to involve 
judgment and hence this phase must be designated as sub- 
jective The selection of sources and of the items copied 
from them is also subjective Inferences made with respect 
to cause and effect relationships and generalizations respecting 
the prevalence of a practice or an idea cannot be other 
than subjective The historical research worker cannot keep 
himself out of his research, if we interpret such research to 
include more than the mere collection of data. The point to 
be noted, however, is that the historical investigator should 
seek to know his data and to make mterpretation in accord 
with their limitations In other words, he should endeavor to 
maintain a scientific attitude. 

Generalizations should be made wdth caution. A particularly 
dangerous type of generalization is that a given event was 
the first of its kind. For example, first use of the ques- 
tionnaire as a means of collecting data has been credited to 
Sir Francis Galton in about 1875.^ There are numerous evi- 
dences, however, that this instrument was used prior to Galton’s 

1 Henderson, E N “Francis Galton,” Cyclopedia of EduccUion, VoL 3. 
New York The Macmillan Company, 1912, p. 4. 
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time ^ The cautious investigator will report that the event is 
the earliest of which he has record. In generalizing with respect 
to the prevalence of a given practice at a given time it is desir- 
able to support the generalization by citations to numerous and 
well distributed occurrences of the practice Generalization in 
historical research requires, as in other t3?pes of research, all of 
the data or a representative sample of them Some very mis- 
leading statements appear in current histories of education 
because of the hasty generalizations made by their authors on 
the basis of scanty and unrepresentative data 

A BIBLIOGRAPHY OF TYPICAL HISTORICAL EDUCATIONAL 
RESEARCH STUDIES 

The student interested in the history of education or in conducting a 
histoncal research will find the following references helpful A critical 
reading of several of these studies affoids an excellent learning activity for 
engendering a knowledge of historical research techniques Other historical 
studies can be located by consulting Monroe, Walter S , and Shores, Louis, 
Bibliography and Summaries in Education under ^‘Plistory of Education ” 

Abelson, Paul. ^‘The Seven Liberal Aits,’’ Teachers College, Columbia 
University Contributions to Education, No 11 New York Bureau of 
Publications, Teacheis College, Columbia University, 1906 150 pp 

Brown, S W “The Secularization of American Education as Shown by 
State Legislation, State Constitutional Provisions, and State Supreme 
Court Decisions,” Teachers College, Columbia University Contributions 
to Education, No 49 New York* Bureau of Publications, Teachers 
College, Columbia University, 1912 160 pp. 

Cole, P R “Later Roman Education m Ausonius, Capella, and the 
Theodosian Code,” Teachers College, Columbia University Contributions 
to Education, No 27- New York Bureau of Pubhcations, Teachers 
College, Columbia University, 1909, 39 pp 

Dearborn, N. H. “The Oswego Movement m Amencan Education,” 
Teachers College, Columbia University Contributions to Education, 
No. 183 New York: Bureau of Publications, Teachers College, 
Columbia Umversity, 1925. 191 pp 

^ See Monroe, W S , et al “ Ten Years of Educational Research, 1918-1927,” 
Vnvoerstiy of lUinou Bulletin, Vol 25, No 51, Bureau of Educational Research 
Bulletin, No 42 Urbana University of Illinois, 1928, pp 36-38 
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Fitzpatrick, E A. ‘‘The Educational Views and Influence of DeWitt 
Clinton/’ Teachers Collegey Columbia University Contributions to 
Education, No 44. New York Bureau of Publications, Teachers 
College, Columbia University, 1911 157 pp 

Good, H. G “The Sources of Spencer’s ‘Education,’” Journal of Educa^ 
tional Research, 13 325-35, May, 1926 

Gwynn, Aubrey Roman Education from Cicero to Quintilian Oxford: 
Clarendon Press, 1926 260 pp 

Hansen, A 0 Liberalism and American Education in the Eighteenth Cen- 
tury New York The Macmillan Company, 1926. 317 pp. 

Jackson, G L “The Development of School Support in Colonial Mas- 
sachusetts,” Teachers College, Columbia University Contributions to 
Education, No 25 New York Bureau of Publications, Teachers 
College, Columbia University, 1909 95 pp 

Kemp, W W “The Support of Schools in Colonial New York by the 
Society for the Propagation of the Gospel in Foreign Parts,” Teachers 
College, Columbia University Contributions to Education, No 56 New 
York Bureau of Publications, Teachers College, Columbia University, 
1913 279 pp 

Knight, E W. “The Influence of Reconstruction on Education in the 
South,” Teachers College, Columbia University Contributions to Educa- 
tion, No 60 New York Bureau of Publications, Teachers College, 
Columbia University, 1913 100 pp 

Maddox, W A “The Free School Idea in Virginia before the Civil War,” 
Teachers College, Columbia University Contributions to Education, 
No 93 New York. Bureau of Pubhcations, Teachers College, Colum- 
bia University, 1918. 225 pp 

Noble, S G “Early School Superintendents in New Orleans,” Journal of 
Educational Research, 24 274-79, November, 1931 

Rabbnort, W L “Spinoza as Educator,” Teachers College, Columbia 
University Contributions to Education, No. 38. New York: Bureau of 
Publications, Teachers College, Columbia University, 1911 87 pp 

Reigart, J. F. “The Lancastrian System of Instruction m the Schools of 
New York City,” Teachers College, Columbia University Contributions to 
Education, No 81. New York Bureau of Pubhcations, Teachers 
College, Columbia University, 1916. 105 pp 

Robbins, C L “Teachers in Germany in the Sixteenth Century. Condi- 
tions in Protestant Elementary and Secondary Schools,” Teachers 
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College, Columbia University Contributions to Education, No. 52. New 
York Bureau of Education, Teachers College, Columbia University, 
1912. 126 pp. 

Taylor, H C. ‘^Educational Significance of the Early Federal Land 
Ordinances,” Teachers College, Columbia University Contributions to 
Education, No 118 New York Bureau of Publications, Teachers 
College, Columbia University, 1922 138 pp. 

Totah, K a “The Contribution of the Arabs to Education,” Teachers 
College, Columbia University Contributions to Education, No 231. New 
York* Bureau of Publications, Teachers College, Columbia University, 
1926, 105 pp 

Updegrayf, Harlan. “The Origin of the Moving School in Massachu- 
setts,” Teachers College, Columbia University, Contributions to Educa- 
tion, No 17 New York Bureau of Publications, Teachers College, 
Columbia University, 1908 186 pp. 

Wells, G F “Parish Education in Colonial Virginia,” Teachers College, 
Columbia University Contributions to Education, No 138 New York 
Bureau of Publications, Teachers College, Columbia Umversity, 1923. 
95 pp. 

Woody, Thomas “Early Quaker Education m Colonial Pennsylvania,” 
Teachers College, Columbia Umversity Contributions to Education, 
No 105 New York Bureau of Pubhcations, Teachers College, 
Columbia University, 1920 287 pp. 



CHAPTER VII 

CONSTRUCTING MEASURING INSTRUMENTS 
A General Principles 

The scope of measurement in education. When measurement 
m the field of education is mentioned, we commonly think 
merely of the administration of a test designed to measure either 
general intelligence or some segment of school achievement. 
This is an unfortunate restriction For years a fundamental 
credo of research workers has been: Whatever exists at all exists 
in some amount and can be measured ^ Although this ideal has 
not been fully realized, many ingenious measuring instruments 
have been devised and today we are able to measure, at least 
crudely, many things other than general intelligence and school 
achievement. These include personality traits, interests, atti- 
tudes, socio-economic status of homes, school buildings, quality 
of supervision, educational need, and the like 
A basic assumption.^ It may seem platitudinous to point 

^This statement is frequently attributed to Thorndike, but the present 
writers have been unable to locate it m his writings The following quotations, 
however, express the essential ideas “Whatever exists at all exists in some 
amount " “We have faith in whatever people now measure crudely by mere 
descriptive words, helped out by the comparative and superlative forms, can 
be measured more precisely and conveniently if ingenuity and labor are set at 
the task ” 

Thorndike, E L “The Nature, Purposes, and General Methods of Measure- 
ments of Educational Products,” The Seventh Yearbook of the National Somet'y 
for the Study of Education, Part II Bloomington, Illinois Public School Publish- 
ing Company, 1918, p 16 

2 No attempt is made to consider the assumptions involved in the measure- 
ment of human abilities and traits The following references represent two 
attempts to formulate assumptions in certain fields of measurement 

Tyler, Ralph W “Assumptions Involved in Achievement-Test Construc- 
tion,” Educational Research Bulletin, 12 29-36, February 8, 1933 

Hendrickson, Gordon “Assumptions Underlying Personality Measure- 
ment,” Journal of Experimental Educafwn, 2 243-49, March, 1934 
The following reference may also be read with profit in this connection 

171 



172 STUDY OF EDUCATIONAL PROBLEMS 

out that 111 attempting the measurement of human abilities and 
traits, their existence is assumed, but this assumption means 
more than is commonly realized Existence implies stability. 
Human performance varies ^ even when conditions are con- 
sidered to be unchanged and forgetting is a characteristic phe- 
nomenon Hence, it may be inferred that human abilities and 
traits are not perfectly stable. To the extent that they fluctuate, 
the basic assumption of stability is not satisfied and their meas- 
urement is subject to a significant hmitation. 

The meaning of measurement as applied to human abilities 
and traits. In measuring distance, weight, and the hke, the 
procedure may be described as direct,” meaning that the 
measurer deals “directly” with the magnitude whose measure- 
ment is desired. The most direct measurement of human abil- 
ities or traits is accomplished by securing the performance that 
the ability or trait makes possible or insures and then describing 
this performance in quantitative terms. In this sense the 
measurement of calculation skill in arithmetic or of ability 
in handwriting may be described as direct Measurement, how- 
ever, may be indirect Temperature is measured by measuring 
the height of a column of liquid In this case, indirect measures 
are valid because the correlation between temperature and the 
height of the column of liquid is approximately 1 00. The weight 
of children may be measured indirectly by measuring their 
height but obviously the measures obtained in this way will not 
be highly valid. Many of our measures of human abilities and 
traits are indirect. For example, when measuring the silent 
reading ability of sixth-grade children, we usually desire an index 
of their performance when they read assignments m history, 
literary selections, and the like In a particular case, we may 
desire an index of their performance when they read a particular 
type of material in response to certain directions and imder 
certain conditions such as reading an article in an encyclopedia 

Lindqmst, E F , and Anderson, H R “ ‘Achievement’ Tests m the Social 
Studies,” The Educational Record, 14 198-256, April, 1933 

1 See page 131 
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to secure information relative to certain questions and doing this 
in a library where there are distraetmg influences In such cases, 
scores on a typical silent reading test are mdirect measures of 
the designated ability. In many cases the difference between the 
obtamed performance and the one we are interested m is even 
more obvious. We use the performances given in response to 
true-false exercises or some other type of objective test as a 
means of securing measures of the ability to do something very 
different. Such mdirect measures may be useful but frequently 
they involve relatively large variable errors of validity 
A test score should be thought of as a measure of ability to 
do, capacity to do, or tendency to do under certain conditions 
and at a certain time. Achievement is commonly thought of as 
having some degree of permanency. Usually the ability whose 
measurement is desired is that which will function at some date 
after the administration of the test ^ Hence, the complete label 
of test scores as measures would include specifications m regard 
to the conditions under which the ability to do, capacity to do, 
or tendency to do is to function and the date on which the func- 
tioning is to occur For example, spelling achievement might be 
defined as the ability to spell under the conditions of actual 
writing a week after the testing, assuming normal forgetting and 
only incidental learning Hence, if the meaning implicitly 
associated with the scores yielded by a test were explicitly 
stated, the attached label would be of the following type: 
“numerical index of ability to under 

conditions at . . date, the intervening learning and 

forgetting being ” 

A measure of an ability or trait is a numerical index of a 
designated performance to be given under prescribed condi- 
tions, and hence the only essential requirement is that the ob- 


^ For two studies of retention at the high school level, see 
Layton, E T “The Persistence of Learning in Elementary Algebra,” Journal 
of Educational Psychology, 46-55, January, 1932 
Kennedy, L R “The Retention of Certain Latin Syntactical Principles by 
First and Second Year Latin Students after Various Time Intervals,” Journal 
of Educational Psychology, 23 132-46, February, 1932 
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tained measures correctly discriminate between individuals 
whose future performances differ even when these differences 
are small. In other words, the scores made by pupils are to be 
considered satisfactory measures if the difference between them 
corresponds to the difference between the future performances 
of these pupils under the prescribed conditions. 

Direct measurement is not essential but it should be noted 
that indirect measurement depends upon the stability of the 
relationship between what the test measures directly and the 
ability or trait whose measurement is desired. This relationship 
is not necessarily a fixed one. It depends upon the objectives 
towards which the instruction has been directed and it is likely 
to be changed by the repeated administration of a test of a given 
type. Hence, indirect measurement is based upon a relationship 
that is subject to change This means that the validity of a given 
instrument is not necessarily a stable characteristic. In other 
words, a test that is highly valid in one situation is not neces- 
sarily equally valid m another This point is especially im- 
portant in connection with true-false, multiple-choice, and other 
types of tests constructed by teachers. Such instruments may 
37ield highly valid indirect measures of achievement when first 
administered to a group of students but the vahdity is likely to 
decrease as the students direct their efforts to becoming able to 
respond to them unless the ability to respond to such exercises 
is recognized as a desirable educational objective ^ 

General types of instruments for measuring human abilities 
and traits. The most common types of instrument for measuring 
human abilities and traits is the 'performance test^ which requires 
the person, whose capacity or achievement is being measured, 

^ The widespread use of objective tests by teachers is undoubtedly contribut- 
ing much to setting the objectives towards which students direct their efforts 
This condition is probably making the tests increasingly less valid as measures 
of the desired achievement For an elaboration of this point and comments upon 
certain undesirable results of the use of the objective tests, see Douglass, H R. 
“The Effects of State and National Testing on the Secondary School,” School 
Renew, 42.497-509, September, 1934 

For other points of view see Barr, A S , and others “A Symposium on the 
Effects of Measurement on Instruction,” Journal of Educational Research, 
2S 481-527, March, 1935 
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to give a performance, usually a written one, which represents 
the functioning of that capacity or achievement or which is 
considered to be indicative of this capacity or achievement. 
Psychological questionnaires, which are used to obtain measures 
of general patterns of conduct, represent a second t3^e of meas- 
urmg instrument The performance required consists of answers 
to questions concerning practices, beliefs, preferences, judg- 
ments, and the like. A third type of measurement procedure is 
rating or estimating with or without a formal scale on the basis 
of general acquaintance or systematic observation. To these 
three types of instruments there may be added observation 
(noting the frequency of performance of certain acts), inter- 
viewing, and laboratory and clinical techniques.^ 

Basic problems in the construction of measuring instruments. 
Before the construction of a test is begun, it is necessary to 
determine what is to be measured Ability in spelling, ability in 
silent reading, ability in arithmetic, and the like are commonly 
used as if such phrases designated defined achievements. Usu- 
ally they do not In fact, we know relatively httle about the 
nature of the abilities we claim to measure. Within a given 
subject-matter field, achievement usually represents a combma- 
tion of two or more types of controls of conduct. There is also 
considerable overlapping between achievements in different 
fields, due in part to the presence of general intelligence as a 
common factor. Beyond such general items, our information is 
limited 

The analysis of human abilities and traits and the identifica- 
tion of their elements are haste problems and logically their solu- 
tion should precede the construction of measuring instruments. 
These problems are engaging the attention of a number of 
workers,^ but it wiE be some time before we have the basic in- 


1 For a more comprehensive treatment of measurement procedures, see 
Symonds, P M Diagnosing Personality and Conduct, New York Century 
Company, 1931 Chapters II, III, XI, XII, and XIV form an excellent reference 
supplementary to the present discussion Symonds also deals with free associa- 
tion tests 

2 See references to factor analysis in Chapter XI. 
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formation that we need m test construction. Meanwhile, a test- 
maker should attempt to specify as definitely as possible the 
nature of the ability or trait he desires to measure It is es- 
pecially important that he recogmze and distinguish between 
the different types of achievement In the measurement volume 
of the report of the Commission on the Social Studies, seven 
rubrics are indicated for this field: ^ (1) exact information (mem- 
orized facts), (2) technical vocabulary, (3) ability to apply 
ideas and information to new situations, (4) skills, (5) interests, 
(6) attitudes, (7) ability to express. The learnings included 
vary with the subject-matter field but an analysis of this type 
indicates the general character of the achievement A testmaker 
may not wish to measure all phases but he should exphcitly 
recognize the restriction. 

The general problems involved in constructing a performance 
test Given the specifications of the abihties to be measured, 
the next problem is to determme the type of test to be con- 
structed. With reference to the difidculty of the exercises of a 
test, their arrangement may be irregular, uniform, or scaled. 
In the first type, the arrangement of the exercises is not related 
to their relative diJfficulty Tests of this type are illustrated by 
those prepared informally by teachers In the second type the 
exercises are of equal difficulty, or approximately so. In the last 
type the exercises vary from very easy to very difficult and are 
arranged in ascending order of difficulty Frequently, in such 
tests the mcrease in difficulty from exercise to exercise is ap- 
proximately uniform. Vanations of these types include spiral 
and cycle tests. In the former, different sub-tests may be ar- 
ranged in order of increasing difficulty, the exercises within 
each sub-test being uniform in difficulty. In cycle tests, different 
types of exercises recur at regular intervals In determining the 
type of test to be constructed, the testmaker should be guided 
by the nature of the measurement desired If it is desired to 
secure a measure of the rate of the functioning of a group of 

1 Kelley, T. L , and Krey, A. C Tests and Measurements in the Social Sciences 
New York Charles Scribner’s Sons, 1934, p 105. 
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skills, the test should be uniform in diflSculty. When ^^power’^ 
IS to be measured, a scaled test is generally employed.^ 

The second problem is that of devising and selecting test 
exercises that will secure pupil performances satisfactory for the 
measurement of the specified ability or trait It is necessary that 
the performance given in response to an exercise be observable. 
A second requirement is that the exercises be valid It is desir- 
able that the responses to the exercises be objectively scorable. 

The third general problem relates to the determination of the 
length of the test (number of test items), organization of the test 
items, administration time, and formulation of an explanation of 
the nature of the exercises to the subjects taking test and of 
instructions for administering it. 

The fourth problem relates to the quantitative description of 
the obtained test performance Several questions are involved — 
rules for scoring, weighting of the exercises, units of the scale of 
description, and its zero pomt In the case of handwnting, writ- 
ten composition, and other performances for which a general 
quality description is desired, there are the problems of devising 
an appropriate quality scale and determimng the best technique 
for using it. 

Testing the completed instrument forms the final problem. 
Two questions require consideration: (1) How accurately does 
the test measure what it actually measures? (2) How accurately 
does it measure the specified ability or trait? These two ques- 
tions are commonly designated as those of reliahihty and 
validity 

In view of the large number of tests and other measuring 
instruments that have been constructed, it may appear that 
these problems should have been solved for many abilities and 
traits. It is true that for measuring general intelligence, general 
educational status below the senior high school level, and 
achievement m certain subject-matter fields, a number of tests 

^ For a more extended discussion, see Monroe, W S. An iTiiroductyyn. to the 
Theory of Educational Measurements Boston. Houghton Mifldin Company, 
1923, pp 62-76 
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are available whose scores are highly reliable and exhibit fairly 
close agreement with the criterion measures used to determine 
their validity. Critical study of available tests, however, reveals 
a number of unsolved problems and the research worker who has 
occasion to select a measuring mstrument for a particular pur- 
pose, especially when conducting an experimental investigation, 
seldom is able to find one that exactly meets his requirements 
Hence, the field of measurement still offers challenges to the 
research worker. In fact, there are a number of important 
problems that have scarcely been touched. 

B. Details of Test Construction 

Devising and selecting test exercises. The criteria to be 
observed in devising and selecting test exercises are (1) ob- 
servability of performance, (2) convenience m administering the 
test and in scoring the test papers, (3) objectivity in scoring, 
(4) difficulty of the test items, and (5) validity. The second and 
third criteria are important requirements when the test is de- 
signed for general use, but when it is being devised as a means of 
securing measures needed in an experimental study or some 
other investigation, convenience in admmistenng and m scoring 
the test papers and objectivity ^ are desirable but not necessary. 
Having determined upon the type or types of exercises to be 
employed, the testmaker proceeds to devise a number of such 
exercises being guided by his estimates of validity and difficulty. 
These exercises are then administered to a typical population 
for an experimental determination of validity and difficulty. 

Criteria of validity to be observed in devising test exercises.^ 
Although indirect measurement is possible and the only de- 

1 Many persons appear to consider objectivity in scoring an essential require- 
ment for even a reasonably satisfactory test This position is not justified by the 
facts For example, see Traxler, A E , and Anderson, H A “The Reliability of 
an Essay Test in English,” School Review, 43 534-39, September, 1935 

English essay examinations scored under carefully controlled conditions 
yielded reader reliabilities of 94 for one form and 85 for the other. Correlation 
between two forms was 60 

2 The reader will find it helpful to refer to the discussion of the causes of errors 
of validity in Chapter V. 
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pendable determination of validity is by ascertaining the degree 
to which the exercise contributes to the identification of in- 
dividual differences in the ability or trait whose measurement is 
desired, the testmaker should give attention to the content of the 
exercises. It is desirable to make the measurement as direct as 
possible. If specific habits are to be measured, it is sometimes 
possible to devise exercises that will call for the normal function- 
ing of them. In the case of the calculation skills of arithmetic, 
there is the obvious suggestion that the test exercises consist of 
typical examples In many cases, however, normal functioning 
IS not feasible When spelling ability is to be measured, the 
dictation of word lists does not provide for normal functioning. 
The act of silent reading does not eventuate in an observable 
performance Hence, it is necessary to devise a type of exercise 
that will call for silent readmg plus an observable performance 
indicative of the ability to read. When an exercise calling for 
normal functioning is not feasible, the testmaker should be 
guided by the experimental evidence pertaimng to the relative 
validity of the possible t3^es of exercises. 

When the achievement to be measured consists of the ability 
to respond to arithmetical problems or to other types of thought 
questions, an obvious suggestion is that the test exercises should 
present typical problematic situations in the field of knowledge 
achievement being considered In complying with this sugges- 
tion, it is necessary to bear in mmd the requirement of con- 
venience and objectivity in scoring If the test is designed as an 
instrument for general use, there is the further requirement that 
the time necessary for responding to an exercise be reasonably 
short In order to satisfy these requirements, testmakers have 
proposed various types of objective exercises — true-false, multi- 
ple-choice, completion, and the hke. The use of such exercises 
raises the question of the extent to which they measure the 
achievement that functions m responding to questions that ask 
the student to discuss, explain, compare, and the like They 
cannot measure such achievement directly, but indirect measure- 
ment IS possible. On the basis of rather crude data, some test- 
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makers have concluded that tests consisting of objective exer- 
cises do measure such achievement to a suflficient extent to 
justify their use More carefully planned inquiries indicate that 
an objective test measures about 60 per cent of what is measured 
by an essay type of examination when the latter is carefully pre- 
pared and scored ^ A community of function of this degree does 
not make the measurement of knowledge achievement by means 
of objective tests very satisfactory When a short testing time 
and convenience and objectivity in scoring are not impera- 
tive requirements, the use of exercises that will yield more 
direct measures of knowledge achievement is to be recom- 
mended. 

When the achievement to be measured consists of general 
patterns of conduct, the nature of this rubric of controls of con- 
duct suggests that the criterion of the acquisition of an attitude, 
interest, ideal, or other generalized control is the conformity to 
the pattern m responding to a variety of situations This seems 
to be implied m the concept of a generalized control of conduct 
There is the further implication of conformity to the pattern in 
spite of temptation to do otherwise, or at least of conformity in 
situations in which there is no expressed or implied requirement 
of conformity. Hence, a formal test administered within a 
typical time limit does not provide for the normal functioning of 
a general pattern of conduct This means that the construction 
of a formal test to measure an attitude, interest, ideal, or other 
generahzed control must be based upon the principle of indirect 
measurement, at least in part 

Criteria relating to the difficulty of test exercises. In the case 
of speed tests, especially when the function is narrow, the nature 
of the exercises is determined by the purpose of the measuring 
instrument. This would be true of a test designed to measure 

1 Cochran, R. E , and Weidemann, C. C “ 'Explain* Essay versus Word- 
Answer Fact Test,” The Phi Delia Kappan, 17 59-61, December, 1934. 

This point IS considered further m Chapter VIII. The technique employed 
in these recent investigations is described in Chapter XI as well as the meaning 
of the statement that one test measures 60 per cent of what is measured by an- 
other test 
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skill in the fundamental addition conbinations or skill in doing 
arithmetical examples of a given type In many cases, however, 
the difficulty of the exercises requires consideration 

Although a testmaker^s estimates of difficulty w^ill not be very 
dependable, he should endeavor to devise exercises whose dif- 
ficulty IS that desired m the completed test In a scaled test, the 
exercises vary in difficulty from very easy to very difficult. 
Hence, when this type of test is being constructed, the testmaker 
should endeavor to devise exercises that represent a wide range 
of difficulty. If a uniform test is bemg constructed, he should 
endeavor to have the exercises approximately equivalent in 
difficulty and the degree of estimated difficulty should be that 
for which the test will yield the most valid measures. It is 
obvious that an exercise so easy that all subjects give the correct 
response or so difficult that none respond correctly is worthless 
for testing purposes, because its discrimmatmg value would be 
zero This condition suggests the inference that the maximum 
discriminating value is attained when an exercise is done cor- 
rectly by about fifty per cent of the subjects. This inference is 
supported by some experimental evidence, but it appears that 
the discriminative value varies only slightly between thirty and 
seventy per cent of correct responses ^ Hence, when devising 
exercises for a uniform test, an effort should be made to have the 
difficulty fall within these hmits. 

Experimental determination of the relative validily of in- 
dividual test exercises. A valid test reveals differences in 
ability that exist and one that fails to reveal a difference which is 
at all marked is distinctly lacking in this quality. This concept 
of validity may be applied also to individual test items. When 
so applied, it means that in general the responses to a single 
exercise will be different for groups of pupils known to differ with 
respect to average status of the ability whose measurement is 
being attempted. For example, if two groups of pupils differ 

^Thurstone, T G “The Difficulty of a Test and Its Diagnostic Value,” 
Journal of Educational Psychology, 23 335-43, May, 1932 

Symonds, P. M “Choice of Items for a Test on the Basis of Difficulty,” 
Journal of Educational Psychology, 20 481-93, October, 1929 
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with respect to silent reading ability, a valid exercise will yield a 
higher per cent of correct responses by the group whose average 
status is the higher. If the per cent of correct responses is ap- 
proximately the same for the two groups, the vahdity of the 
exercise approaches zero. Another way of thinking of the 
validity or discrmimating power of a test item is m terms of the 
accuracy with which subjects responding to the exercise are cor- 
rectly placed on the scale of ability by their responses. Perfect 
placement is attamed when all subjects who fail the item occupy 
positions on the scale below any subject who gives the correct 
response 

The first of the above expositions of the meaning of validity 
as applied to individual test items ^ suggests a method of 
determining the relative validity of the exercises of the prelimi- 
nary form of a test. The essential requirement is two or more 
groups of subjects differing in average status with respect to the 
specified ability. For precise determinations it is necessary to 
have a series of groups whose average standings are evenly 
spaced over the range of the ability within which measurement 
is desired. For example, suppose ability to spell m normal writ- 
ing is specified. The first step would be to secure a number of 
groups of pupils differing m average status with respect to this 
ability and evenly spaced on the scale of the ability. Suppose 
that ten such groups have been identified. Then the validity of 
a single test item would be indicated by the corresponding series 
of per cents of correct responses. A high degree of discriminating 
power would be indicated by a series of per cents exhibiting 
fairly umform positive increments. 

A criterion is essential for the determination of groups that 
differ with respect to the specified ability. In the case of general 
intelligence and achievement abilities that obviously increase 
from year to year or from grade to grade, sequential age or grade 

^ Sometimes “ validity of test items ” is used in the sense of %nternal consistency. 
This interpretation leads to the use of the total score on the test being taken as 
the criterion Although investigation of the internal consistency of a test may 
be desirable, the procedure should not be thought of as a means of determining 
the validity of test items. 



CONSTRUCTING MEASURING INSTRUMENTS 183 


groups of typical pupils have been used. Binet, who in col- 
laboration with Simon in 1905 devised and published the first 
general intelligence test,^ employed age groups and selected those 
exercises that w^ere responded to correctly by a larger per cent of 
the pupils of the older groups Otis ^ in 1923 employed one group 
of retarded pupils and one of accelerated pupils. Thurstone^ 
selected groups on the basis of scholarship records. In validating 
exercises for trade tests, groups classified on the basis of training 
and experience as novices, apprentices, journeymen, and experts 
have been used Groups selected in such ways are not likely to be 
very evenly spaced with reference to average status and there 
will be considerable overlapping Hence, the determination of the 
validity of the test items will not be entirely satisfactory. A 
more precise criterion is needed for selecting spaced groups If a 
test is available which experience has proven to yield reasonably 
valid measures of the ability or trait, it may be used for selecting 
the members of the several groups The Stanford Revision of the 
Binet test is frequently used for this purpose in constructing a 
group intelligence test Usually, however, such an instrument is 
not available. 

After the test items have been administered to a series of 

1 Bmet, A , et Simon, T “ M6thodes Nouvelles pour le Diagnostic du Niveau 
Intellectuel des Anormaux,” UAnnee Psychologigue, 11 191-244, 1905. 

Revisions were published by Bmet and Simon m 1908 and by Binet alone in 
1911 In 1908 Goddard made a translation of the scale devised by Binet and in 
1911 he published a revision Kuhlmann published a revision m 1912. In the 
same year the Stanford Revision first appeared This scale prepared by Terman 
and others was made generally available in 1916 In 1917 Otis, working under 
Terman, devised what is generally considered the first group intelhgence scale. 
His work was adopted by the committee of psychologists who prepared the well- 
known Army Alpha Scale used in the testing of our military forces in 1917-18. 
After the war, several group intelligence tests were prepared for school use and 
intelligence testing in the public schools became widespread For a brief history 
of the development of intelligence tests consult the following reference or one of 
several texts on intelligence tests 

Monroe, Walter S , et al “Ten Years of Educational Research,’* University 
of Illinois Bulletin, Vol 25, No 51, Bureau of Bducational Research Bulletin, 
No 42 XJrbana University of Illinois, 1928, pp 89-90, 94-95 

^Otis, A S “The Making of a Classification Test,” Contributions to Educa- 
tion, Vol I. Yonkers-on-Hudson, New York World Book Company, 1924, 
pp 149-59 

®Thurstone, L L “Cycle-Omnibus Intelligence Test for College Students,” 
Journal of Educalional Research, 4 265-78, November, 1921. 
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spaced groups, there remains the task of calculating an index of 
the discrmiinatmg power of each item Vincent ^ has leported 
a study m which an index of discrimmating power of test items 
was obtained by calculating the per cent of pupils making a 
wrong response to the exercise who had criterion scores equal to 
or greater than the median of the criterion scores of the pupils 
doing the exercise correctly. Other techniques have been de- 
scribed by Paterson,^ Symonds,® Wilson, Welsh, and Gulliksen,^ 
Lentz, Hirshstein, and Finch,® and Zubin ® Cook ^ has re- 
ported a study in which he evaluated five methods applicable to 
test items whose scoring is dichotomous He claims that the bi- 
serial r technique® is probably more reliable than the other 
methods he investigated experimentally and should be used 
where the items approach uniformity in difficulty or represent 

1 Vincent, Leona “A Study of Intelligence Test Elements,” Teachers College^ 
Columbia University Contributions to Education^ No 152 New York Bureau of 
Publications, Teachers College, Columbia University, 1924, pp 9 f 

2 Paterson, D G Preparation and Use of New Type Examinations Yonkers- 
on-Hudson, New York World Book Company, 1925, p 60 

3 Symonds, P M Measurement in Secondary Education New York The 
Macmillan Company, 1927, p 535 

4 Wilson, W R , Welsh, G , and Gulliksen, H “An Evaluation of Some In* 
formal Questions,” Journal of Applied Psychology, 8 206-14, June, 1924 

s Lentz, T F , Hirshstein, Bertha, and Finch, J H “Evaluation of Methods 
of Evaluating Test Items,” Journal of Educational Psychology, 23 344-50, May, 
1932 

® Zubin, Joseph “The Method of Internal Consistency for Selecting Test 
Items,” Journal of Educational Psychology, 25 345-56, May, 1934 

^ Cook, W W “The Measurement of General Spelling Ability Involving Con- 
trolled comparisons between Techniques,” University of Iowa Studies, Studies 
in Education, Vol 6, No 6 Iowa City, Iowa University of Iowa, 1932 112 pp. 

The “Index of Discnmination D” described by Cook was used by Lindquist 
and Anderson See 

Lindquist, E F, and Anderson, H R “Objective Testing m World His- 
tory,” Historical Outlook, 21 115-22, March, 1930 

See also Lindquist, E F , and Cook, W W “Experimental Procedures in 
Test Evaluation,” Journal of Experimental Education, 1 163-85, March, 1933 

Richardson, M W “Notes on the Rationale of Item Analysis,” Psychomet- 
rika, I 69-76, March, 1936. 

Richardson, M W “The Relation between the Difficulty and the Differen- 
tial Validity of a Test,” Psychometrika, 1 33-49, June, 1936 

® Bi-serial r represents the correlation between success and non-success on a 
given exercise and the criterion and measures Several formulae have been pro- 
posed See McNamara, W J , and Dunlap, J W “A Graphical Method for 
Computing the Standard Error of Bi-serial r,” Journal of Experimental Educa- 
tion, 2 274-77, March, 1934 One formula is given m Chapter IX, page 236 
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high levels of ability ^ When the scoring of the exercises of a 
test is not dichotomous, i e , when the responses may be classified 
into more than two categories, three methods have been pro- 
posed — correlation ratio or eta, McCall method, and Long 
method ^ 

Although the techniques proposed for determining the validity 
of test items have not been described, it has perhaps occurred to 
the reader that the amount of labor involved is so enormous that 
precise test construction is not possible except for one who has a 
large and trained clerical staff at his command. This is true, but 
it should be noted that the number of hours of labor may be 
greatly reduced when calculating and tabulating machines are 
available. Lindquist and Cook ® state that for bi-serial r they 
devised a procedure by means of which a single operator using 
Hollerith tabulating equipment can compute this index for from 
50 to 75 items per hour, not including the time required to punch 
the cards 

The obtained index of the discriminating power of a test exer- 
cise is a function of the criterion and the experience of the sub- 
jects to which it was administered as well as of the content of 
the item If the criterion measures are lacking validity, the 
obtained indices will be in error If the exercise is unknown to 
the subjects and they merely guess in responding to it, the 
validity of the item will be zero. If the subjects have been 
taught the wrong response, a negative validity will be obtained. 


1 This conclusion is in agreement with the findings of Barthelmess, H. M. 
*‘The Validity of Intelligence Test Elements,” Teachers College^ Columbia Uni- 
versity Contributions to Education, No 505 New York Bureau of Publications, 
Teachers College, Columbia University, 1931, pp 11 f 

The bi-serial r technique has been employed by Brigham m his evaluation of 
items for the scholastic aptitude test of the College Entrance Examining Board. 
See Brigham, C C. A Study of Error New York College Entrance Examining 
Board, 1932, 384 pp 

2 For a description of these methods, see Barthelmess, H M Op cit , pp Ilf. 
This reference also reports a comparative study of these methods together with 
other methods useful when the scoring is dichotomous A later study has been 
reported by Long, J A “Improved Overlapping Methods for Determining 
Validities of Test Items,” Journal of Experimental Education, 2 264r-68, March, 
1934 

3 Lindquist and Cook, op eit ,p 182. 
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In general, the discriminating power of a test exercise at a 
particular grade level depends upon the curriculum and the 
quality of the instruction. Hence, experimental determinations 
of validity should not be thought of as characteristics of the test 
items alone. They are subject to change. Determinations for 
one school population may not be correct for another. 

Experimental determination of the difficulty of test items. The 
relative difficulty of test items for a given population of subjects 
is indicated by the per cents of correct responses, but when it is 
considered desirable to have the items of a test evenly spaced 
with reference to difficulty, it is necessary to secure the transla- 
tion of these per cents into measures of difficulty expressed in 
terms of an ability unit. The procedure commonly employed is 
based on the assumption that in a large unselected group the 
distribution of the ability for which a test is being constructed 
is approximately that of the normal probability curve. In other 
words, if the members of a large unselected group were classified 
on the basis of measures of the ability, the frequencies thus 
obtained would form a normal distribution. A point on the 
scale of this distribution designates a degree or level of ability 
and its location may be specified m terms of the per cent of the 
group to the left of the point. For example, if 35 per cent are to 
the left of the point, it is located a distance of 385a to the left of 
the mean of the distribution.^ 

If a test exercise has been administered to a large unselected 
population, the subjects responding correctly are assumed to be 
the ones whose ability is greater than those who fail to give the 
correct response. Hence, the per cent of correct responses defines 
a point on the scale of ability. Tables have been constructed 
which give the points on the scale of ability corresponding to 
various per cents of correct responses. This scale is usually ex- 
pressed from an arbitrary zero point instead of the mean. 

^ This value is obtained from a table giving the areas under the normal curve 
corresponding to deviations from the mean. If 35 per cent are to the left, there 
are 15 per cent between the point and the mean The deviation corresponding 
to this area is SSScr, 

See pages 79-80 for discussion of the normal probability curve 



COJv[STRUCTING MEASURING INSTRUMENTS 187 

McCall ^ has proposed a point at 5 Oo- to the left of the mean, 
but 2 5cr and 3 Ocr have been used 
Stability of determinations of validity and difficulty of test 
items. On page 174, it was noted that changes in the curriculum 
or instruction may change the validity of an item The difficulty 
value obtained for a test exercise depends upon the general 
status of the population to which it has been adnunistered and 
upon the training the subjects have received relative to the exer- 
cise. Hence, determinations of the validity and difficulty of test 
items should not be thought of as stable characteristics This 
principle means that a test item which is “good” for one popula- 
tion may be “poor” for another and that an item which is 
“good” at the time of the construction of a test may be “poor” 
at a later date It is also possible that “poor” items may become 
“good” items It is probably true that the quality of test items 
does not often change very rapidly, but determinations of 
validity and difficulty should not be thought of as highly stable. 
Furthermore, the validity index obtained for an item is probably 
influenced by the context in which it appears. Hence, the 
validity of an item in the final form of the test may not be the 
same as m the preliminary form 

Validity of items versus validity of test. The calculation of a 
validity mdex for a large number of items and the selection of 
those having the highest indexes for the test makes the process 
of test construction impressive, and many testmakers appear to 
have assumed that the procedure insures the best test that can 
be assembled from the items The prime consideration in test 
construction is that the resulting instrument yield highly valid 
and reliable measures and it does not necessarily follow that the 
items with the highest validity indexes will make the most valid 
test. Investigation ^ indicates that a test consisting of the “best” 

1 McCall, W A jS'ow to Measure in JSducaiion. New York The Macmillan. 
Company, 1922, pp 274-75 

See also Monroe, W S An Introduction to the Theory of Educational Measure- 
ments Boston Houghton Mifflin Company, 1923, pp 96-97 

2 Smith, Max “The Relationship between Item Validity and Test Valid- 
ity,*’ Teachers College^ Columbia UniversUy Contributions to Education^ 
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items may not be more valid than one consisting of “mediocre^* 
items These findings raise the question of the value of elaborate 
difficulty and validity analyses of test items It is likely that 
selection of items on the basis of simple procedures will usually 
result ui a test validity little lower than that attained by the 
elaborate procedures described. 

C. Description of Test Performances 

Quantitative description of objectively scorable test perform- 
ances. In formulating directions for scoring the responses to 
the exercises of a test and for combinmg the credits into a point 
score, attention should be given to (1) dimensions of ability, 
(2) rate units versus work units, (3) weighting, or relative credit 
for correct responses to the various exercises, (4) correction for 
guessing 

1. Recognition of dimensions of ability A complete descrip- 
tion of a test performance requires determination of its quality 
or accuracy, the difficulty of the exercises, and the rate at which 
the performance was given. In other words, a performance may 
be thought of as three dimensional Little attention has been 
given to the dimensions of test performances, but precise 
measurement seems to require it The number of exercises right 
IS generally taken as the score This score represents an unknown 
combination of two or all three of the dimensions. Under the 
caption of the Law of the Single Variable, Burgess suggested 
that the form of the test and its plan of administration be such 
that the difficulty and rate of work will be constant in all pupil 
performances ^ This condition is approximated in a timed 
sentence dictation spelling test. The Courtis Silent Reading 

No 621. New York Bureau of Publications, Teachers College, Columbia 
University, 1934 40 pp. 

^ Burgess, May Ayres Measurement of S%lent Reading Ability New York: 
Bussell Sage Foundation, 1921, p 61 

For one proposal for combinmg speed and quality in the case of handwriting, 
see 

Gates, A I “The Relation of Quality and Speed of Performance: A Formula 
for Combining the Two m the Case of Handwriting," Journal of Educational 
Psychology, 15 129-44, March, 1924 
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Test provides for measuring rate and quahty of comprehension 
independently A few other testmakers have provided for 
separate scores to describe different aspects of the performance. 
It seems reasonable that recognition of the Law of the Single 
Variable might lead to significant refinements in measuring 
human abilities and traits At least this detail of test construc- 
tion offers a challenge to the critical student of educational 
measuiement. 

2 Rate units versus work units. In the case of a speed test, 
the performance may be described in terms of the number of 
units of woik done (usually the number of exercises done cor- 
rectly) within the allowed period of time, or in terms of the mean 
number of seconds per unit of work The two methods of de- 
scription yield different results and, hence, may lead to con- 
flicting conclusions, particularly in research on the effect of 
practice on individual differences.^ In such cases, it is adidsable 
to express the measures in terms of both units of work and 
units of time.^ 

3 Weighting or relative credit for correct responses to the various 
exercises If the exercises of the test are approximately equal in 
difficulty, the usual practice is to assign the value of one unit to 
each When they vary materially in diflSculty, a number of test- 
makers have devised a plan of weighting, usually based upon a 

1 Peterson, Joseph, and Barlow, M C “The Effects of Practice on Individual 
Differences,” The Twenty-Seventh Yearbook of the National Society for the Study 
of Education, Part II Bloomington, Ilhnois Public School Publishing Com- 
pany, 1928, pp 211-30 

Peterson states on page 214 of the above reference “The confusion resulting 
from comparing absolute gam in amount of work dorm per unit of time with 
amount of time required to do a given piece of work, I clearly pointed out and il- 
lustrated copiously with graphs ten years ago, but the warning then sounded and 
later emphasized has been heeded by only a few investigators ” The writings 
referred to are 

Peterson, Joseph “Experiments in Ball-Tossing The Significance of Learn- 
ing Curves,” Journal of Experimental Psychology, 2 178-224, 1917 

Peterson, Joseph “Thurstone's Measure of Variability in Learning,” Psycho- 
logical Bulletin, 15 452-55, 1918 

Peterson, Joseph “Johnson’s Measurement of Bate of Improvement under 
Practice,” Journal of Educational Psychology, 15 271-75, May, 1934 

2 Conversion from units of work to units of time can easily be accomplished 
by computation of “harmonic means ” See Odell, C W Educational Statistics. 
New York The Century Company, 1925, pp 97-104. 
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measure of the relative difficulty of the several exercises In the 
case of tests of the scaled type, a pupil will not ordinarily do all 
of the exercises up to a given point of the scale and then fail in 
all beyond this point. The typical performance is one in which 
there is a scattering of correct responses beyond the point of the 
first incorrect one. This situation has caused some testmakers 
to devise a techmque for computing a difi&culty score which 
involves weightmg ^ It appears, however, that a system of 
weighting does not affect the reliability of test scores to a marked 
degree and the usual practice is to take the number of exercises 
done correctly as the pupiFs score even though there may be 
marked variations m difl&culty.^ 

Most of those who have studied the effect of weighting have 
employed reliability as the criterion The effect of weighting 
upon the validity of the measures has received relatively little 
attention Furthermore, it appears that the social value of an 

1 Kelley, T L “Thorndike’s Reading Scale, Alpha 2, Adapted to Individual 
Testing,” Teachers College Record, 18 253-60, May, 1917 

Kelley, T L “A Simplified Method of Using Scaled Data for Purposes of 
Testing,” School and Society, 4 34-37, July, 1916 
Van Wagenen, M J “Table for Computing Mean Individual Scores in 
Educational Scales,” Teachers College Record, 21 441-51, November, 1920 

2 The interested reader should consult the following reports of research on the 
problem Corey contends that when scores are transmuted into grades that 
weightmg has a significant influence on the grades assigned Odell, Peatman, 
and Potthoff and Barnett contend that the effect of weightmg is of some, but not 
of great sigmficance in its effect on grades 

Corey, S M “The Effect of Weightmg Exercises m a New Type of Examina- 
tion,” Journal of Educational Psychology, 21 383-85, May, 1930 

Douglass, H. R , and Spencer, PL “Is It Necessary to Weight Exercises in 
Standard Tests^” Journal of Educational Psychology, 14 109-12, Februarj^ 
1923 

Monroe, W S “The Description of the Performances of Pupils on Exercises 
of Varying Difl&culty,” School and Society, 15 341-43, March 25, 1922 
Odell, C. W “Further Data Concernmg the Effect of Weighting Exercises in 
New Type Examinations,” Journal of Educational Psychology, 22 700-04, 
December, 1931 

Peatman, J. G. “The Influence of Weighted True-False Test Scores on 
Grades,” Journal of Educational Psychology, 21 143-47, February, 1930 
Potthoff, E. F , and Barnett, N E “A Comparison of Marks Based upon 
Weighted and Unweighted Items in a New Type Exammation,” Journal of 
Educational Psychology, 23 92-98, February, 1932 

Scates, D E, and Noffsmger, F. R “Factors Which Determine the Ef- 
fectiveness of Weighting,” Journal of Educational Research, 24 280-85, Novem- 
ber, 1931 
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exercise should receive consideration as well as its difficulty. 
Frequently these criteria are incompatible as may be shown by 
considering two exercises, the first, calling for the date of the 
signing of the Declaration of Independence, and the second, call- 
ing for the date of a minor battle of the Civil War. The first 
exercise calls for information of much greater social value than 
the second. One would be considered ignorant indeed if he did 
not know the significance of the date, 1776 On the other hand, 
frequent contact with the date and its significance makes the 
first exercise a very easy one for most eighth-grade pupils The 
second date is of much less social value, but it would probably be 
known by very few pupils On the basis of the criterion of social 
value, the first exercise merits the greater weight, but on the 
basis of difficulty, the second exercise would be given the greater 
weight. 

4 Correction for guessing When the number of possible re- 
sponses to an exercise is limited, it is possible to make the correct 
response merely by guessing. Hence, if the subjects to whom the 
test is administered guess when they do not know the response 
to make, the “number right” will tend to be too large as a score. 
The formula usually used to secure a corrected score is 

Score = Number right — ^ 

(n-l) 

The symbol, n, stands for the number of possible responses 
to an exercise In the case of true-false and other alternative 
test exercises, the formula becomes* 

Score = Number nght — Number wrong 

For exercises requiring a choice from three possible answers, 
the formula becomes : 

at 1 • Number wrong 

Score = Number right ^ ^ 

When the number of possible responses is increased to five or 
more, the correction is probably not essential. After reviewing 
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the research relating to correction for guessing, Ruch ^ concluded 
that the question of the effect upon reliability is still debatable, 
but that correction for guessing appears to increase the validity 
of the scores It is likely that the effect of correction for guessing 
IS conditioned by the character of the exercise and the instruc- 
tions relative to guessing.^ 

Quantitative description of recorded performances by means 
of quality scales. Performances such as samples of handwriting 
or written compositions cannot be scored as ^ ^ right ” or ^Svrong ’’ 
They must be described in terms of degrees of quahty. This 
description may be accomplished by constructing a quality 
scale and then matching the performances with a step of this 
scale. The essential steps m the construction of a quality scale 
are as follows: (1) Collecting sample performances which repre- 
sent as wide a range of quality as possible; (2) Selecting from 
this collection a limited number of samples (10 to 20) which 
represent the entire range of quahty, and which differ from each 
other by approximately equal increments of quality; (3) Deter- 
mming the quantitative description of the quality of each sample 
with reference to an established zero point, and arranging the 
samples in the form of a scale. 

The procedures for the second and third of these steps have 
been described m several texts on educational measurement ® 
and need not be considered here The merit of the resulting 
quahty scale depends upon the original collection of sample per- 
formances. Insofar as possible they should differ only with re- 
spect to the quahty to be measured For example, if the scale 
is to measure handwriting, the children should write the same 
text and employ the same style of writing If the scale is to 
measure composition, the pupils should write on the same topic. 
Another consideration is that all degrees of excellence should be 

iRuch, G M. The Objective or New-Type JEJxaminaiion Chicago Scott, 
Foresman and Company, 1929, pp 318-57. 

^ See W ood, E P “ Improving the V alidity of Collegiate Achievement Tests, ’ ^ 
Journal of Educational Psychology, 18 18-25, January, 1927 

® For example, see Monroe, W S An Introduction to the Theory of Educa- 
tional Measurement Boston Houghton Mifflin Company, 1923, pp. 135 f 
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represented For example, in the case of drawing, one sample 
should approximate zero drawing ability. It should be charac- 
terized as an attempt to draw, but an attempt which represents 
no accomplishment The number of samples collected should be 
large so that one can be reasonably sure that all degrees of abihty 
are represented. 

The matching of the test performances with the steps of the 
scale is a subjective process which contributes to the unreliability 
of the measures secured. Several investigators have compared 
ratings by means of a scale with estimates of quality made with- 
out the use of a scale. The evidence is conflicting and hence dis- 
appointing. Odell, after summarizing investigations of the 
reliability of ratings made by means of composition scales and 
considering data with respect to the reliability of a set of scales 
for rating pupils^ answers to thought questions, concluded that 
^'if the scales themselves possess high merit and if those who em- 
ploy them do so after the best possible preparation and in the 
best possible manner,” the reliability of the resulting measures 
of quality will be higher than that of estimates made without 
the use of a scale ^ 

Derived scales of measiirement The scale of measurement 
on which the pomt scores are expressed is determined by the 
structure of the test and the manner of computing the score 
Usually the zero point is arbitrary and does not represent ^^not 
any of the thing measured ” McCall has proposed a scale of 
measurement having the zero pomt at 5 Ocr below the mean score 
of a large and unselected population of 12-year-old children. 
The unit of measurement is .Icr and the measures expressed on 
this scale are known as T-scores ^ Test scores can be converted 

1 Odell, C W. “The Use of Scales for Rating Pupils’ Answers to Thought 
Questions,” University of Illinois Bulletin, Vol 26, No 36, Bureau of Educa- 
tional Bulletin, No 46 Urbana University of Illinois, 1929, p 28 

2 For an explanation of the principle involved, see pages 82-83 

Ruch and Stoddard have pointed out that this principle was implied m 
Gal ton’s treatment of correlation 

Ruch, G M , and Stoddard, G D Tests and Measurements in High School 
Instruction Yonkers-on-Hudson, New York World Book Company, 1927, 
P 351 

For discussion of the T-score procedure, the interested reader should consult 



194 STUDY OF EDUCATIONAL PROBLEMS 


into a number of other types of derived measures. Transmuta- 
tion into age scores is a common procedure Familiar examples 
are mental age, educational age, achievement age, arithmetic 
age, and reading age A given reading age of 115 months means 
that the pupil who obtained this score did as well on the test as 
the median pupil whose chronological age is 115 months or 9 
years 7 months ^ When the median scores for a series of grades 
and for each month of the school year have been determined, 
point scores may be transmuted into grade scores. A grade score 
of 5 6 obtained by a pupil would indicate that his achievement on 
the test was equivalent to the median pupil in the sixth month of 
the fifth grade. The use of age scores and grade scores rests on 
the assumption that the groups whose scores are compared have 
been subjected to comparable educational experience That is 
to say, the pupils have been subjected to the same curriculum 
and school organization 

the references given below In addition to references dealing with McCall’s pro- 
cedure, some are included relative to the extension of McCall’s procedure to 
several grade groups by Pintner and by Thurstone Holzinger’s criticisms and 
Thurstone’s replies to his critic are included The Thurstone and Holzinger 
references are grouped together in the order in which they appeared 

Kelley, T L “Comparable Measures,” Journal of Educational Psychology^ 
6 589-95, December, 1914 

McCall, W A “A Proposed Uniform Method of Scale Construction,” 
Teachers College Record, 22 31-51, January, 1921 
Monroe, W S An Introduction to the Theory of Educational Measurements 
Boston Houghton Mifflin Company, 1923, pp 150-52 

Woodworth, K S “Combining the Results of Several Tests A Study in 
Statistical Method,” Psychological Review, 19 97-123, March, 1912 
Thurstone, L L. VA Method of Scaling Psychological and Educational 
Tests,” Journal of Educational Psychology, 16 433-51, October, 1925 
Thurstone, L. L “The Scoring of Individual Performance,” Journal of 
Educalional Psychology, 17 446-57, October, 1926 
Thurstone, L L “The Unit of Measurement in Educational Scales,” Journal 
of Educational Psychology, 18 505-24, November, 1927 
Holzinger, K J “Some Comments on Professor Thurstone’s Method of 
Determinmg the Scale Values of Test Items,” Journal of Educational Psychology, 
19. 112-17, February, 1928. 

Thurstone, U L. “Comment by Professor L L. Thurstone,” Journal of 
Educaiwnal Psychology, 19: 117-24, February, 1928 
Holzinger, K. J. “Reply to Professor Thurstone,” Journal of Educational 
Psychology, 19* 124-26, February, 1928 
Thurstone, L, L “Scale Construction with Weighted Observations,” Journal 
of Educational Psychology, 19: 441-53, October, 1928 
1 For further discussion of age scores, see Monroe, op cd , pp 155-56 
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The computation of the percentile points of a distribution of 
scores makes possible the transmutation of scores into percentile 
measures For example, if a score of 83 falls within the interval 
between the 47 and 48 percentile, it may be expressed as a 
percentile score of 47. The pupiFs ability may be described 
as of the 47 percentile ^ Although percentile scores are expressed 
m terms that can be easily understood, they do not possess the 
qualities which are possessed by derived scores expressed in 
terms of the variability of a chronological age group The 
unit is not constant The distribution is divided so that equal 
areas are obtained in calculatmg percentile scores. The divi- 
sions do not mark equal distances on the base line of the dis- 
tribution. The result is that the difference between the degrees 
of ability represented by a 45 percentile score and a 50 percentile 
score is not at all equal to the difference between the degrees of 
ability represented by a 90 percentile score and a 95 percentile 
score The latter is much larger Percentile scores are useful, 
however, when precise comparison is not attempted, and at the 
high school level where age and grade scores are not applicable. 

Composite score of a battery of tests. When the measuring 
instrument consists of several sub-tests and a smgle composite 
score is desired, it is necessary to derive a procedure for com- 
putmg it. If it is desired to give equal weight to the measures 
resulting from the several sub-tests, the scores may be reduced 
to a common basis ^ and then added. If criterion measures are 
available, the weights for the maximum vahdity may be ob- 
tained by calculating the multiple regression equation.® 

The psychological questioimaireA The psychological ques- 
tionnaire is used to measure human traits (attitudes, interests, 
ideals, and other general patterns of conduct) by asking the 

1 For further discussion of percentile scores, see Monroe, op, cd , p 154 

Otis, A S Btat%sUcal Method in Educational Measurement. Yonkers-on- 
Hudson, New York* World Book Company, 1925, pp 24-29, 95-100, 124-131, 

Euch and Stoddard, op at , pp. 347-50 

2 See pages 82 f 

® See pages 324 f. 

* For a more extended discussion see Symonds, P M Diagnosing Personality 
and Conduct. New York The Century Company, 1931, pp 122 f. 



196 STUDY OF EDUCATIONAL PROBLEMS 


pupil questions relative to what he has done, how he is ac- 
customed to do certain things, his likes, beliefs, choices, wishes, 
preferences, interests, and the like The distinction between it 
and a performance test is that the latter attempts to secure 
the functioning of the ability whose measurement is desired or 
the functioning of a closely correlated ability In the psycho- 
logical questionnaire there is no such attempt It merely asks 
questions whose answers experience has shown to be indic- 
ative of the status of the trait whose measurement is de- 
sired Some of the questions call for deliberation and the 
expression of a judgment Others ask for immediate expres- 
sion of choices or other reactions Usually the questions are 
expressed m a form such that the scoring of the answers is 
objective 

The only essential criterion for devising and selecting the 
items of a psychological questionnaire is their effectiveness in 
indicating individual differences in the trait whose measurement 
is desired A question may appear to be inconsequential or 
even ridiculous,^ but if experience shows that it is effective, 
its appropi lateness has been demonstrated In general, how- 
ever, a superior initial selection of questions will be secured by 
analyzing the behavior associated with the trait and attempting 
to formulate pertinent questions. 

A psychological questionnaire may be used as a means of 
securing a controlled interview for the purpose of diagnosing 
children individually. When used for this purpose by a com- 
petent person, a definite plan of scoring may not be advisable, 
but when groups of subjects are being studied, a numerical 
score IS usually desired. The criterion of the validity of a method 
of scoring is the effectiveness of the resulting measures in re- 
vealmg individual differences in the trait whose measurement 
is desired. In general no answer can be considered as wrong. 

1 For illustrations see Wells, F L “Report on a Questionnaire Study of Per- 
sonality Traits with, a College Graduate Group/’ Mental Hygiene, 9 113-27, 
January, 1925 

Symonds, PM “A Studiousness Questionnaire,” Journal of Educational 
Psychology, 19 152-67, March, 1928 
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The questions do not call for facts as in the case of performance 
tests- Any response to an item of a psychological questionnaire 
may be significant and m the present stage of the development 
of this field of measurement a person constructing a measuring 
mstrument of this type should try out several methods of 
scoring, at least all promising ones, and calculate the correlation 
between the resulting sets of scores and the criterion. 

A number of psychological questionnaires were used m the 
comprehensive investigation of Terman and his co-workers, 

Mental and Physical Traits of a Thousand Gifted Children ^ 
In Chapters XIII, XIV, and XV are described instruments for 
the measurement of scholastic, occupational, play, and reading 
interests. In Chapters XVII and XVIII are described ques- 
tionnaires for the measurement of character and personality 
traits. Voelker's ten tests of trustworthiness, Cady^s measuies 
of incorrigibility, and the Woodworth-Cady Questionnaire for 
the measurement of emotional stability are also described. The 
Woodworth-Cady Questionnaire is reproduced in full on pages 
501 to 505 An illustration of the efforts to measure the less 
tangible and definite controls of conduct is afforded by an 
attempt to measure ^Taith m God ^ 

1 Terman, L M , et al " Mental and Physical Traits of a Thousand Gifted 
Children,” Genetic Studies of Genius^ Vol I Stanford, Cahfornia Stanford 
University Press, 1925 648 pp 

2 Donnelly, Harold I “ Measuring Certain Aspects of Faith in God as Found 
in Boys and Girls Fifteen, Sixteen, and Seventeen Years of Age,” A Thesis in 
Education Presented in Partial Fulfillment of the Requirements for the Degree of 
Doctor of Philosophy Philadelphia University of Pennsylvania, 1931 118 pp 

The reader interested in psychological questionnaires may find the following 
references helpful 

Droba, D D “A Scale of Militarism — Pacifism,” Journal of Educational 
Psychology, 22 96-111, February, 1931 

Garretson, 0 K “Relationships between the Expressed Preferences and the 
Curricular Abilities of Ninth-Grade Boys,” Journal of Educational Research, 
23 124-32, February, 1931 

Lincoln, E A , and Shields, F. J “An Age Scale for the Measurement of 
Moral Judgment,” Journal of Educational Research, 23 193-97, March, 1931 

Thurstone, L L “A Scale for Measuring Attitude toward the Movies,” 
Journal of Educational Research, 22 89-94, September, 1930 

Watson, Goodwin “Happiness among Adult Students of Education,” 
Journal of Educational Psychology, 21 79-111, February, 1930 An example of 
measurement by self-estimate. 
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D. Estimating the Excellence op Measuring Instruments 

The reliability and validity of a measuring instrument. After 
a measuring instrument has been constructed including the 
formulation of directions for administering it and for obtaining 
the score, it remains to determine how accurately the instru- 
ment measures the ability or trait specified by its expressed or 
implied function In Chapter V four types of errors were noted: 
(1) variable errors of measurement, (2) variable errors of vahd- 
ity, (3) systematic errors of measurement, (4) systematic errors 
of validity. Since the systematic error in the scores yielded by 
a test fluctuates with the manner of its administration and other 
conditions not connected with its structure, indices of only 
variable errors of measurement and variable errors of validity 
can be associated with a measuring instrument. The reliability 
of a test has reference to the variable errors of measurement and 
the validity of a test to the variable errors of vahdity.^ When 
used m a technical sense, these terms have no connection with 
systematic errors. 

The association of an index of variable errors of measurement 
or of the variable errors of validity with a test implies that the 
average’^ magnitude of the errois to be expected in the case 
of a given test is a fixed characteristic of the measuring in- 
strument This implication is not necessarily true In the 
case of a particular administration of a given test, the actual 
variable errors of measurement or the actual variable errors of 
validity may be materially greater than the index indicates. 
It is also possible that they may be less.^ Hence, the indices 
dealt with in the following pages should be thought of as repre- 
senting estimates of the magmtude of the variable errors to be 
expected rather than as fixed characteristics of a test or other 
measuring instrument. 

1 Although ‘‘validity” may be defined so that it refers to only variable errors 
of validity, a calculate index of validity usually is a measure of variable errors 
of measurement and variable errors of validity. 

2 For a study of the effect of practice upon the reliabihty of a test, see Anastaci, 
Anne “ The Influence of Practice upon Test Reliability,” Journal of Educational 
P^chologyt 25* 321-35, May, 1934, 
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Determining an index of the magnitude of the variable errors 
of measurement to be expected in the scores yielded by a given 
instrument. The determination of an index of the magnitude 
of the variable errors of measurement ^ to be expected in the 
measures obtained by means of a given instrument is based 
upon two mdependent sets of scores for a group of subjects 
representative of the population for which the test is designed. 
Three methods are used for securing these scores (1) repeating 
the administration of the test, (2) administermg two forms of 
the test, and (3) securing two senes of scores by dividing the 
test into two equivalent parts, usually by using the odd num- 
bered items as the basis of one score and the even numbered 
ones as the basis of the other None of these methods is en- 
tirely satisfactory Both sets of scores should be typical of 
testing conditions that may be expected to prevail when 
the test is administered to a group of subjects. When a 
test is readministered, the conditions are likely not to be 
typical because the subjects may remember some of the ex- 
ercises and their responses. Their attitude may also be af- 
fected When two forms of a test are used, there is likely to be 
some “transfer^’ from the first to the second testing. Further- 
more, unless very carefully constructed, the two forms are 
likely not to be entirely equivalent. When the two sets of 
scores are obtained by dividing a test into halves, the test- 
mg conditions are identical and hence the effect of typical 
variations in testing conditions is eliminated. Furthermore, 
this procedure introduces other difficulties that will be noted 
later. 

1. Coefficients of reliability. The term ^‘reliability coefficient’^ 
appears to have been used first by Spearman m 1910 Relia- 
bility coefficients, however, appeared in his writmgs as early as 
1904 and were defined as “the average correlation between 

iThe variable error of measurement is tbe difference between an obtained 
score and the corresponding theoretical true score (Xw) which is defined as the 
mean of a large number of scores made by the same subject on equivalent forms 
of the test, each score being corrected for systematic error. See pages 130-32 for 
discussion of causes of variable errors of measurement 
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one and another of these several independently obtained series 
of values ” ^ In other words, the basis of a reliability coefficient 
was to be two independently obtained sets of measures of the 
same thing. This is equivalent to saying that the basis should 
be two sets of measures the pairs of which differ only because 
of the presence of variable errors of measurement. A reliabihty 
coefficient does not measure any systematic error that may be 
present.^ When a coefficient of reliability is calculated from 
IQ’s or other quotient scores, “spurious correlation” is in- 
troduced For example, if an intelligence test with zero relia- 
bility IS administered to a population heterogeneous with ref- 
erence to chronological age, the reliabihty of the IQ’s will be 
.50 For references dealing with “spurious correlation” see 
Chapter XI, page 388 

The coefficient of correlation between the two sets of paired 
scores obtained by either of the first two of the above procedures 
is commonly called the coefficient of reliability (rij).^ As pointed 
out above, the paired measures obtained either by repeating 
a test or by using two forms of a test do not completely qualify 
as “two independently obtained sets of measures of the same 
thing ” In general, the coefficient calculated from scores ob- 
tained by repeating the test is likely to be slightly too large 
and one calculated from scores obtained from duplicate forms 
slightly too small 

1 Spearman, C "The Proof and Measurement of Association between Two 
Things,” American Journal of Psychology, 15 90-101, January, 1904 

Spearman, C "General Intelligence Objectively Determined and Measured,” 
American Journal of Psychology, 15 253-93, April, 1904 

For a definition of reliability in terms of variance ratio, see Dunlap, J W. 
"Comparable Tests and Reliability,” Journal of Educational Psychology, 
24 442-53, September, 1933 This is an excellent critical discussion of the 
techniques for determining rehabihty For an explanation of the variance ratio 
see Chapter XI 

2 For an illuminating reference on this point, see Daniel, R P "Basic Con- 
siderations for Valid Interpretations of Experimental Studies Pertaining to 
Racial Differences,” Journal of Educational Psychology, 23. 15-27, January, 
1932. 

^Kelley favors the use of the term "retesting coefficient” for a coefficient 
obtained by the first procedure, see Kelley, T L Interpretation of Educational 
Measurements Yonkers-on-Hudson, New York World Book Company, 1927, 
pp. 39-40 " Consistency coefficient” has also been proposed 
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The Spearman-Brown formula ^ is employed as a means of 
calculating a coefl&cient of rehability from the coefficient of 
correlation between the two sets of scores obtained by the 
third procedure Using the symbol, j, to designate the co- 

2 71 

efficient of correlation between the two halves of the test, the 
formula is 


rii 


2 II 


1 + r I j_ 
2/1 


The use of this formula has been questioned Sheii ^ has 

1 The formula was published simultaneously by Spearman and by Brown 
Spearman, C Correlation Calculated from Faulty Data,” British Journal 
of Psychology, 3 71-295, 1910 

Brown, W “Some Experimental Results in the Correlation of Mental Abil- 
ities,” British Journal of Psychology, 3 296-322, 1910. 

2Shen, Eugene “The Standard Error of Certain Estimated Coefficients of 
Correlation,” Journal of Educational Psychology, 15 462-65, October, 1924 
See also the criticism by Holzmger and the reply of Shen 
Hoizinger, K J , and Clayton, Blythe “Further Experiments in the Applica- 
tion of Spearman’s Prophecy Formula,” Journal of Educational Psychology, 
16 289-99, May, 1925 

Shen, Eugene “A Note on the Standard Error of the Spearman-Brown 
Formula,” Journal of Educational Psychology, 17 93-94, February, 1926 
See also 

Lanier, L H “Prediction of the Reliability of Mental Tests of Special Abil- 
ities,” Journal of Experimental Psychology, 18 69-113, February, 1927 

Holzmger, K J “ Note on the Use of the Spearman-Brown Prophecy Formula 
for Reliability,” Journal of Educational Psychology, 14 302-05, May, 1923 
Douglass, H R , and Cozens, F W “ On Formula for Estimating the Reliabil- 
ity of Test Batteries,” Journal of Educational Psychology, 20 369-77, May, 1929 
Farnsworth, P R “The Spearman-Brown Prophecy Formula and the Sea- 
shore Tests,” Journal of Educational Psychology, 19 586-88, November, 1928 
Kelley, T L “The Applicability of the Spearman-Brown Formula for the 
Measurement of Reliability,” Journal of Educational Psychology, 16 300-03, 
May, 1925 

Kelley, T L “Note on the Reliability of a Test A Reply to Dr Crum’s 
Criticism,” Journal of Educational Psychology, 15 193-204, April, 1924 
Remmers, H H “The Equivalence of Judgments in the Sense of the Spear- 
man-Brown Formula,” Journal of Educational Psychology, 22 66-71, January, 
1931 

Remmers, H H , Shock, N W , and Kelly, EL “An Empirical Study of the 
Validity of the Spearman-Brown Formula as Applied to the Purdue Rating 
Scale,” Journal of Educational Psychology, 18 187-95, March, 1927 
Ruch, G M , Ackerson, Luton, and Jackson, J D “An Empirical Study of 
the Spearman-Brown Formula as Applied to the Educational Test Material,” 
Journal of Educational Psychology, 17 309—13, May, 1926 (Continued next page) 
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shown that the standard error of the estimated coefficient is 
greater than that of an equivalent reliabihty coefficient cal- 
culated directly. In practice, however, this condition is likely 
to be less significant than failure to satisfy completely the 
assumptions on which the Spearman-Brown formula is based 
The formula is a special case of the one for the correlation of 
sums ^ which simplifies to the above form only when certain 
conditions are satisfied The reliability coefficient ru may be 
written r/i . iw-l -j- When expressed m this form, four sets 

\2 II J 

of measures are apparent, those from the two halves of Test 1 
and those from the two halves of Test I. Two assumptions are 
made. (1) the standard deviations of these four sets of measures 
are equal, (2) the various intercorrelations are equal. In 
applying the Spearman-Brown formula, one has only two sets 
of measures, one designated as Xi and the other as If 

the standard deviations of these two sets of measures are equal, 
the ru obtained is the coefficient of correlation between two 
hypothetical tests, one formed by doubling the haK designated as 

~ and the other formed by doublmg the half designated as 

M 11 

The obtamed ru is the correct ru provided the measures re- 
sulting from the added portions have the same standard devia- 
tions as the measures yielded by the original halves and pro- 
vided further the various mtercorrelations are equivalent. 
Brownell ^ has shown that in the case of objective tests con- 
structed by an instructor for measuring the achievement of his 
students, different methods of forming the halves result m a 

Wood, B D “ Studies of Acliievement Tests, Part III, Spearman-Brown Re- 
liability Predictions,” Journal of Educational Psychology, 17 263-69, April, 1926 
Use of the formula is facilitated by the table prepared by Edgerton and Toops 
Edgerton, H. A , and Toops, HA “A Table for Predicting the Validity and 
Reliability Coefficients of a Test When Lengthened,” Journal of Educational 
Research, 18 225-34, October, 1928 

1 Kelley, T L. Statistical Method. New York The Macmillan Company, 
1923, p 197 

® Brownell, W A ” On the Accuracy with Which Reliability May Be Measured 
by Correlating Test Halves,” Journal of Experimental Education, 1 204-15, 
March, 1933. 
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wide range of values forrw. This condition is probably due, 

at least m part, to failure to satisfy the assumptions just noted. 
It is likely that they are not fully satisfied in more carefully 
constructed tests. 

A reliability coefficient obtained from halves of a test and 
the application of the Spearman-Brown formula does not repre- 
sent the same thing as one obtained from two apphcations of 
the test or the application of two forms of the test. Variations 
in the scores of individual pupils are due in part to differences 
in mind-set, effort, and the like. Such differences are probably 
eliminated in the case of two halves of the same test, but they 
are likely to be significant when there are two separate testings, 
especially when there is an interval of a day or more between 
them. Hence, the coefficient derived from the two halves of a 
test and the Spearman-Brown formula may be expected to be 
somewhat larger than would be obtained from a second ad- 
mmistration of the test after an mterval of at least a few hours. ^ 

The reliability of psychological questionnaires has been de- 
termined by the techmques used in determmmg the reliability 
of ordinary tests. Some investigators have admimstered their 
questionnaires twice to the same group of subjects and cor- 
related the results. The weakness of this method is that the 
subjects may remember some of their initial responses, delib- 
erately change them, thus reducing the reliability coefficient 
obtained. The use of equivalent forms is most desirable but 
it is often exceedingly difficult to construct equivalent forms 
of a psychological questionnaire.^ Symonds, after summarizing 

1 For experimental verification, see Foran, T G “A Note on Methods of 
Measuring Reliability,” JouttwI of Bducaiional Psychology^ 22 383—87, May, 
1931 

Jordan, R C “An Empirical Study on the Reliability Coefficient,” Journal 
of Educational Psychology, 26 416-26, September, 1935 

2 Cady, following the suggestion of Kelley, rewrote his questionnaire so that 
each item required the opposite response. He does not consider this device as 
effective as the selection of a number of questions and the careful division of 
these into two forms to be administered at different sittings 

Cady, V M “The Estimation of Juvenile Incorrigibility,” Journal of De- 
linquency Monographs, No 2. Whittier, Calif orma Whittier State School, 1923, 
140 pp 
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the reliability coefficients of a number of psychological ques- 
tionnaireS; concludes that these instruments compare favorably 
in reliability with ordmary tests of the recall, multiple-response, 
and true-false types ^ 

The reliability of a battery of tests may be computed from 
the reliabilities and mtercorrelations of the sub-tests by means 
of the foimula for the correlation of sums.^ If the sub-tests are 
approximately equivalent in reliability and the mtercorrelations 
are high, the general form of the Spearman-Brown formula will 
give nearly the same result When these conditions are not 
satisfied, the result obtained will be spuriously high ^ 

2. Index of rehahility The coefficient of reliability, ru, is an 
index of Zi — Zj rather than of Zi — Z^o A measure of the 
latter may be obtained by calculating the index of rehdbikty 
as indicated by the formula 

rioo = 

in which represents the coefficient of correlation between 
a set of obtained scores and the corresponding set of theoretical 
true scoies 

3 Probable error of measurement. The variable errors of 
measurement are the differences between the obtained scoies 
(Zi) and the corresponding theoretical true scores {X^) The 
median deviation of these differences, which is called the probable 
error of measurement^ is given by the formula ^ 

PEi^ = .6745(ri -ru 

The PEi CO may be interpreted with respect to the group to 
which the test was admimstered or with respect to any member 

1 Symonds, P M, Diagnosing Personality and Conduct. New York The 
Century Company, 1931, p 168 

® See reference to Kelley on page 202. 

® Douglass, H K , and Cozens, F W “ On Formula for Estimating the Re- 
habihty of Test Batteries,’’ Journal of Educational Research, 20 369-77, May, 
1929 

Handy, Urvan, and Lentz, T F. “Item Value and Test Reliability,” Journal 
of EdvmtionaZ Psychology, 25 703-08, December, 1934 
For the derivation of this formula, see pages 132-33 
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of that group In the first case, the calculated PEi ^ is the 
median deviation of the variable errors in the obtained test 
scores. When the interpretation of the calculated PEi.^ is 
with reference to the score of a given pupil, ^ it is to be thought 
of as specifying the limit for which the chances are Just fifty- 
fifty that the variable error will not be greater. For example, 
suppose PEi CO == 4 0 Then for a given pupil the chances are 
fifty-fifty that his score involves a variable error not greater 
than 4 0 It is also possible to interpret the coefficient of re- 
hability in terms of the per cent of pupils whose scores involve 
a variable error greater than a specified amount. ^ 

The formula for the probable error of measurement affords 
a means for an interpretation of coefficients of reliability ® in 
terms of the corresponding median deviations of the variable 
errors of measurement. Unless the value of <ti is known, it 
will appear as a factor in this median deviation Table IX 
gives for various values of vu the correspondmg values of the 
probable error of measurement It should be noted that unless 
ai is relatively small, a reliability coefficient must approach 1 OQ 
in order to indicate small variable errors of measurement 

Generalizing a measure of reliability. When a calculated 
coefficient of reliability, index of reliability, or probable error 

1 This application of the probable error of measurement implies the assump- 
tion that the variable errors are uncorrelated with the scores This assumption is 
only approximately true The larger errors appear to be found m the smaller 
scores and the smaller errors in the larger scores However, the degree of corre- 
lation IS not high and the application of the probable error of measurement to 
individual pupils seems to be justified, but the results should not be considered 
highly precise in the case of either very low scores or very high scores See 

Holzmger, K J “An Analysis of Errors m Mental Measurement,” Journal of 
Educalional Psychology, 14 278-88, May, 1923 

2 Herring, J P “The Verification of Group Examinations,” Journal of Edu- 
cational Psychology, 25 596-602, November, 1924 In this article the formula 
used IS not the one given on page 204 For calculations based upon this formula 
see 

Huffaker, C Ir “The Reliability of Measurement by Group Tests of Mental 
Ability,” Journal of Educational Psychology, 16 493-95, October, 1925 

Herring, J P. “Reply to Huffaker’s Criticism,” Journal of Educational 
Psychology, 16 498-99, October, 1925 

2 A coefficient of reliability may also be interpreted in terms of the ratio of the 
variance (square of the standard deviation) of the true scores to the variance of 
the obtained scores. See Chapter XI 
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^of measurement is associated with a test as an index of the 
variable errors of measurement to be expected when it is ad- 
ministered to a group of pupils, generalization is implied. 
Representativeness of the group from which a determination 
of the index is made is a reqmrement for such generalization. 
As pointed out on page 110, a coefficient of correlation is af- 
fected by the variability or range of talent of the group. For a 
given measuring instrument, the coefficient of reliability cal- 
culated from a single grade group will be smaller than one 
calculated from a population including a sequence of grades, 
A coefficient of .60 calculated from a single grade group may 
be indicative of smaller variable errors of measurement than a 
coefficient of 90 calculated from a population including grades 
III to VIII. Hence, generalization of a coefficient of reliability 
must be restricted to populations of the same range of talent. 
Even in such cases, consideration of the causes of variable 
errors of measurement suggests that their magnitude may not 
be entirely stable Hence, the generalization should not be con- 
sidered highly dependable 


Table IX Values of Coefficient of Reliability tu and Cor- 
responp ing V alues of Probable Error of Measurement 
6745o-iVl — Tii 



6745ffiVl - 


1 6745<ri-v^l “■ rjj 

.50 

.480-1 

90 

21o-i 

.55 

45o-i 

91 

20cri 

.60 

.430-1 

92 

19<ri 

.65 

-40o-i 

93 

.ISo-i 

70 

37o-i 

94 

17o-i 

* 75 

34<ri 

95 

ISo”! 

80 

-30cri 

96 

13o-i 

.82 

.290-1 

97 

12o-i 

.84 

270-1 

98 

lOo-i 

86 

25<ri 

99 

.07o-i 

88 

23o-i 

1.00 

.OOo-i 


The probable error of measurement is assumed to be inde- 
pendent of the population from which the determination is 
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made. On the basis of this assumption, a calculated probable 
error of measurement is commonly associated with a test as a 
measure of the magnitude of the variable errors of measurement 
to be expected in the scores yielded by it. This assumption is 
probably only approximated and hence a calculated probable 
error of measurement may not be a highly dependable index. 
Furthermore, when comparing tests with reference to their 
probable errors of measurement, it should be remembered that 
the significance of a probable error of measurement depends 
upon the range and magnitude of the scores juelded by the test 
If the scores range from 125 to 200, a probable error of measure- 
ment of 5.0 has less practical significance than when the scores 
range only from 25 to 75. 

Determining an index of the variable errors of validity to be 
expected in the scores yielded by a given test. Although we 
speak of determining the validity of a test, it should be remem- 
bered, as pointed out on page 174, that this characteristic of 
an instrument for measuring human abilities and traits is not 
stable. Hence, generalization from a determination for a given 
test is hazardous. Furthermore, it should be noted that the 
phrase, '^validity of a tesU' has no reference to the systematic 
error of validity. 

The determination of an index of the variable errors of validity 
requires criterion measures of the ability or trait specified by the 
function of the test.^ In the case of a prognostic test, certain 
criterion measures are obtainable. For example, if the function 
of a test is to predict future scholastic success, which is defined 
as the school mark received in the field specified, the criterion 
measures are the school marks received by the pupils to whom 
the test is administered. If the test is designed to measure the 
same abilities or traits that are measured by an available test, 
the scores 3 nielded by this instrument may be used as the criterion 
measures. For example, the Stanford Revision of the Binet Test 

^ For a critical discussion of the concept of validity, see Turney, A H “The 
Concept of Validity in Mental and Achievement Testing,” J ournal of Educational 
Psychology j 25 81-95, February, 1934 
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is believed to yield highly vahd measures Hence, mental ages 
obtained from its use are used as criteria for determmmg the 
validity of new intelligence tests The more common case, how- 
ever, is one in which the abihty or trait to be measured is de- 
fined by a phrase such as ability to read silently, ability to 
solve arithmetical problems,’^ “achievement in history,^’ “lan- 
guage ability, “study habits,’’ or “success in teaching” and 
no instrument is available for securing measures of it that are 
known to be highly valid In such cases, testmakers have used 
school marks, teachers’ estimates, and composite scores from a 
selected group of tests ^ Obviously such measures, or combina- 
tions of them, are subject to criticism as criterion measures and 
hence determination of an index of the variable errors of validity 
to be expected m the scores yielded by a test is usually not very 
satisfactory. When criterion measures are available, the coeffi- 
cient of correlation between them and the scores yielded by it, 
Tic, IS called the coefficient of validity.^ 

E. Improvement of Tests 

Revising a test to increase its reliability and validity. After a 
testmaker has determined the reliability and vahdity of his 
instrument, he may desire to investigate the possibihties of 
improving it in these respects. An obvious procedure is that of 
lengthening the test. The general form of the Spearman-Brown 

1 For partial summaries of procedures that have been employed in determining 
validity, see 

Foran, T G “The Meaning and Measurement of Validity/’ The Catholic 
University of America Educational Research Bulletins^ Vol V, No 7 Washing- 
ton, D C The Catholic University Press, September, 1930 27 pp 

Kinney, L B , and Eunch, AC “A Summary of Investigations Comparing 
Different Types of Tests,” School and Society, 36 640-44, October 22, 1932 

Jordan, A M “The Validation of Intelligence Tests,” Journal of Educational 
Psychology, 14 348-66, 414-28, September, October, 1923 This reference gives 
a comprehensive bibliography up to the date of its publication 

Lee, J M , and Symonds, P M “New-Type or Objective Tests A Summary 
of Recent Investigations,” Journal of Educational Psychology, 24 21-38, Janu- 
ary, 1933 

Lincoln, E A “Studies of the Vahdity of the Dearborn General Intelligence 
Examinations,” Journal of Educational Psychology, 19 346-49, May, 1928 

2 For the meaning of the coefficient of validity, see page 148 The interpreta- 
tion IS also considered in Chapter XI 
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formula provides a means of estimating the reliability of a 
lengthened test. 

1 + (n — l)rij 

In this formula Vnn is the rehability coefficient of the lengthened 
test, and n is the number of times the length of the test has been 
increased ^ Solving the equation for n, we have 

Tnnil — Tij) 

n = 

^ 1^(1 “ w 

When written in this form, the number of times the length of the 
test must be increased to secure a desired coefficient of reliability 
(rnn) can easily be calculated.^ On page 202 attention was called 
to the assumptions on which the derivation of the Spearman- 
Brown formula is based, and even when these conditions are 
satisfied the estimated reliability coefficient of a lengthened 
test should be considered a probable upper hmit of the actual 
reliabihty. 

Since the validity of a test is conditioned by its reliability, the 
validity will be increased by lengthening the test. The formula ® 

IS 


nric 

Tnc — • : 

Vn + n{n — l)rir 

in which Vnc is the coefficient of vahdity of the lengthened test. 
It should be noted, however, that since 

Tu == r^roocooVruVrcc 

the upper limit of Vnc is r^ooc-^ V^cc Hence, if r»^coo does not ap- 

1 It should be noted that n may have fractional values When n = 2, the 
formula simplifies to the form given on page 201 

2 The following reference gives a useful table 

Edgerton, H A , and Toops, H A “A Table for Predicting the Validity and 
Reliability Coefficients of a Test When Lengthened,” Journal of Educaiional 
Research, 18* 225-34, October, 1928. 

® Holzmger, K. J Statishcal Methods for Students in Education Boston: Ginn 
and Company, 1928, p 170. 
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proach 1.00, it will not be possible to make the test highly valid 
by lengthening it. 

The rehability and vahdity of a test are affected by the 
difficulty of the test items, ^ admimstration time,^ type of test 
exercises, method of scoring, directions for doing the exercises, 
and possibly by other factors. Hence, in revising a test to in- 
crease its rehability and validity, a testmaker should investi- 
gate the possibihty of effecting improvement in ways in addition 
to increasing its length. 

GENERAL REFERENCES AND SELECTED ILLUSTRATIONS 

This bibliography is mainly a selected list of references dealing with the 
theory and technique m test construction. A few references of historical 
interest and a few texts describing tests available for use have been m- 
cluded The references of the latter type will be helpful to an investigator 
who is seeking a test for a particular purpose The bibliography by Hddreth 
is very complete and may be consulted m this connection 

Ayres, L P. ^'A Measuring Scale for Abihty in Spelhng/^ Russell Sage 
Foundahon Bulletin E 139. New York Russell Sage Foundation, 1915. 
56 pp. 

Spelling scale based on one thousand words found to be of most common 
occurrence in a large amount of correspondence 

Binet, a , et Simon, T ‘^M4thodes Nouvelles pour le Diagnostic du 
Niveau Intellectuel des Anormaux,” UAnnSe Psychologiquej 11 191- 
244, 1905. 

First individual inteUigence test. 

Bovaeu, J F , and Cozens, F. W. “Tests and Measurements in Physical 
Education, 1861-1925 ” Oregon University Publications^ Physical 
Education Series, Vol. 1, No 1 Eugene, Oregon. University of Ore- 
gon, 1926. 94 pp See also Bovard, J F , and Cozens, F. W , Tests and 
Measurements in Physical Education Philadelphia: W. B. Saimders 
Company, 1930. 364 pp. 

Brown, William, and Thomson, G H Essentials of Mental Measurement, 
London Cambridge Umversity Press, 1921. 216 pp. 

^Thurstone, T G **The Difficulty of a Test and Its Diagnostic Value/* 
Journal of Educalional Psychology, 23: 335-43, May, 1932 

2 Lindquist, E F, and Cook, W W. ‘‘Experimental Procedures m Test 
Evaluation,” Journal of Expenmental Education, 1. 163-85, March, 1933. 
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Buckingham, B R. “Spelling Ability. Its Measurement and Distribu- 
tion,” Teachers College^ Columbia University Contributions to Education, 
No 59. New York Bureau of Pubbcations, Teachers College, Colum- 
bia University, 1913 116 pp 

First example of a measuring instrument in which diJBSculty values were 

determined from per cents of correct response and in which the items were 

arranged in order of increasing difficulty. 

Courtis, S. A , et al. “The Measurement of Educational Products,” 
Seuenteenth Yearbook of the National Society for the Study of Education, 
Part II Bloomington, Ilhnois* Pubhc School Publishing Company, 
1918. 192 pp 

An excellent source for information respecting the status of the measure- 
ment movement in 1918 

Dearborn, W F. Intelligence Tests, Boston Houghton Mifflin Company, 
1928 336 pp 

Freeman, F, N. Mental Tests Boston* Houghton Mifflin Company, 1926. 
503 pp 

Fryer, Douglas. The Measurement of Interests in Relation to Human 
Adjustment. New York. Henry Holt and Company, 1931 488 pp. 

Hartshorns, Hugh, and May, M A Studies in Deceit New York The 
Macmillan Company, 1928 414 and 306 pp. (Book One and Book Two 
are bound in the same volume which has the title “Studies in the 
Nature of Character ”) 

Pioneer research in the measurement of character traits. 

Hildreth, G H A Bibliography of Mental Tests and Rating Scales. New 
York* The Psychological Corporation, 1933 242 pp This compilation 
has been supplemented by Buros, 0 K. “Educational, Psychological, 
and Personality Tests of 1933 and 1934,” Studies m Education, Rutgers 
University Bulletin, Vol XI, No 11 New Brunswick, New Jersey: 
Rutgers Umversity, 1935. 44 pp. 

Hillegas, MB. “A Scale for the Measurement of Quahty in Enghsh 
Composition by Young People,” Teachers College Record, 13. 331-84, 
September, 1912. 

First Enghsh composition scale. 

Hull, C. L. Aptitude Testing. Yonkers-on-Hudson, New York: World 
Book Company, 1928. 535 pp 

Ejslley, T L Interpretation of Educational Measurements. Yonkers-on- 
Hudson, New York. World Book Company, 1927. 363 pp. 
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Kwalwassee, Jacob Tests and Measurement in Music Boston: 
C C Birchard and Company, 1927 146 pp. 

McCall, W A **A Proposed Uniform Method of Scale Construction,” 
Teachers College Record^ 22 31-52, January, 1921 
Proposal of the T-score techmque 

Moneoe, W S An Introduction to the Theory of Educational Measurements 
Boston Houghton Mifflin Company, 1923 364 pp 
The first comprehensive treatment of the theory of educational measure- 
ments. 

Odell, C W Educational Measurement in High School New Yoik The 
Century Company, 1930 641 pp 

Odell, C W Traditional Examinations and New Type Tests New York: 
The Century Company, 1928 469 pp. 

Otis, AS Absolute Point Scale for the Group Measurement of 

Intelligence,” Journal of Educational Psychology^ 9 239-61, 333-48, 
May, June, 1918 
First group intelligence test 

Peterson, Joseph Early Conceptions and Tests of Intelligence Yonkers- 
on-Hudson, New York The World Book Company, 1925. 320 pp 

Early history of intelligence testing 

PiNTNEK, Rudolf Intelligence Testing Methods and Results. New York: 
Henry Holt and Company, 1931. 555 pp 

Ruch, G The Objective or New Type Examination. Chicago. Scott, 
Foresman and Company, 1929 478 pp 

Ruch, G. M , and Stoddaed, G. D Tests and Measurement in High School 
Instruction Yonkers-on-Hudson, New York. World Book Company, 
1927. 381 pp 

Stmonds, P. M^ Die^n^smg.F^rsbftodity and Conduct New York: The 
Centurjr C^jiajiy, 1931.' , 

Symonds, “'Measurement in Sicpndaty Education. New York. The 

Muctc^SLn Company, 1927: 588 ilp." . 

Tebman, L M. ^The Intelligence of School Children. Boston Houghton 

MiMm Company, 1919- 317 -pp 
- * 

Supplmentstheead^BDvot^^n8K'j^^itisl^remen^ of Intelligence (1916) which 
IS largely deTote<i'|o;the actijal aTimimstration of the Stanford Revision. 

Tbbman, L M., and Childs, H. G *^A Tentative Revision and Extension 
of the Bmet-Simon Measuring Scale of Intelhgence,” Journal of Educa- 
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ttonal Psychology, Z 61-74, 113-43, 198-208, 277-89, February, 
March, April, May, 1912 

First appearance of the Stanford Revision of the Binet-Simon Scale. 

Thoendike, E L An Introduction to the Theory of Mental and Social 
Measurements New York Teachers College, Columbia University, 
1904 277 pp (Revised edition, 1913 ) 

The pioneer book m its field 

Thorndike, E. L. ''Handwriting,” Teachers College Record, 11.1-93, 
March, 1910. 

First quality scale 

Thorndike, E L , et al The Measurement of Intelligence New York* 
Bureau of Publications, Teachers College, Columbia University, 1926. 

616 pp 

Wilson, G. M "The Purpose of a Standardized Test in Spelhng,” Journal 
of Educational Research, 20 319-26, December, 1929 
Discusses a basic problem m test standardization. 

Wilson, G M , and Hoke, K J. How to Measure. New York The Mac~ 
millan Company, 1928. 597 pp (Revised and enlarged ) 

Wood, B D Measurement in Higher Education. Yonkers-on-Hudson, 
New York: World Book Company, 1923 337 pp. 

Yerkes, R M (Editor). "Psychological Examimng in the United States 
Army,” Memoirs of the National Academy of Sciences, Vol 15, Wash- 
ington Government Printing Office, 1921 890 pp 
Describes the use of the Army Alpha and Army Beta in 1917-18. 



CHAPTER VIII 


STUDYING CURRENT CONDITIONS OR PRACTICES 

The general character of survey investigations. The general 
character of studies of current conditions and practices is illus- 
trated by surveys of school systems^ whose findings include such 
items as per capita expenditures for educational purposes, aver- 
age salary of teachers, average pupil achievement as revealed by 
certain tests, average size of class, facts pertaimng to school 
buildings, years of training and experience of teachers, and age- 
grade status of pupils. Frequently a survey is less compre- 
hensive, but the general purpose is to reveal the status of 
what is studied. The following titles are indicative of typical 
purposes.^ 

A Tentative Inventory of the Habits of Children from Two to Four 
Years of Age (2) 

The Vocabulary of American History (5) 

Mistakes Which Pupils Make m Spelling (10) 

The Status of the Superintendent (13) 

The Social Composition of Boards of Education (16) 

The Duties of the Elementary-School Principal (20) 

Educational Magazines Read by Five Hundred Elementary School 
Principals and Classroom Teachers (25) 

A Survey of the Requirements for the Doctor of Philosophy in 
Education (34) 

Subject Combinations in the Programs of Teachers in Small Second- 
ary Schools m New York State (35) 

^ For a bnef account of the development of the survey movement see Caswell, 
H L. “City School Surveys,’* Teachers College, Columbia University Contributions 
to Education, No 358 New York Bureau of Publications, Teachers College, 
Columbia University, 1929, Chapter II 

It is interesting to learn that at the mvitation of the governor of Rhode Island, 
Henry Barnard made a survey of the public schools of that state in 1845 There 
■was also a survey of school achievement in Boston during the same year. 

2 The numbers m parentheses following the titles refer to the illustrative 
bibliography at the end of the chapter 
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The Training of Modern Foreign Language Teachers in the United 
States (45) 

The Diversity of High School Students’ Programs (49) 

Sometimes the purpose is extended to include a comparison of 
the status of two or more populations as indicated by the follow- 
ing titles. 

A Comparative Study of White and Colored Pupils in a Southern 
School System (23) 

A Comparison of the Achievement of Eighth-Grade Pupils in Eural 
Schools and in Graded Schools (31) 

A Comparative Study of the Physical Growth of Dull Children (51) 

Inquiries of the survey type range from elaborate studies of 
city or state school systems, and even national inquiries, to 
simple studies pertaining to a limited area. Occasionally a sur- 
vey is narrowed down to an intensive study of a single pupil or 
a small group of pupils studied separately. Such surveys are 
called ^'case studies.” Facts pertaimng to current conditions 
and practices frequently become more meaningful when they 
are compared with information respecting corresponding condi- 
tions and practices of the past. Such comparisons may reveal 
trends which are useful in predicting the future. When the trend 
studied is that of the growth of a human trait, the research is 
frequently designated as genetic.” Sometimes the data col- 
lected in a survey are utilized as a basis for investigating the 
existence of relationships. Such studies transcend the types of 
research to be considered in this chapter except when the tech- 
niques employed are relatively simple.^ 

The definition of survey problems. Frequently the problem 
of a survey is first conceived of in general terms, but in order to 
serve as a guide for the subsequent phases of the investigation, 
it must be defined. This defimtion should include the specifica- 
tion of the scope of the survey, the specific questions for which 
answers are to be sought, and the meaning of the technical terms 
employed. As the investigator engages in the later stages of his 

^ Applications of correlation analysis m the study of relationships are con- 
sidered in Chapter XI 
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research, he may decide to modify some of the questions listed 
or even to omit certain ones. Sometimes he may become in- 
terested in increasing the scope of his study The possibility of 
such changes, however, does not lessen the desirability of formu- 
lating an effective defimtion as a first step. When a survey has 
been carefully planned in advance, its later stages, with the ex- 
ception of interpretation, tend to become routine in character — 
an important consideration m large-scale investigations where 
most of the labor must be done by clerical workers 

Adequate defimtion of terms is especially important when a 
specified trait or characteristic cannot be measured directly. 
For example, if the survey is to determine the average teaching 
load^’ of the high schools within a certain area, measures of this 
characteristic cannot be obtained directly but must be com- 
puted from such basic items as number of students, number of 
teachers, and hours per week spent in classrooms. Since no 
standard method has been established for computing the aver- 
age teaching load of a high school, an investigator must formu- 
late a definition in terms of the calculations to be made. If this 
IS not done in planning the survey, he may find that he has 
failed to collect some of the required data 

A Techniques of Sueveys 

The data and their collection.^ The data of surveys vary. 
They include facts pertaimng to such conditions and practices 
as school costs; educational legislation, duties or activities of 
school people; practices with respect to state control of educa- 
tion; overlapping of courses; vocabularies of textbooks, voca- 
tional opportunities of high school graduates; achievement, 
intelligence, and other traits of pupils, socio-economic status of 

^ The interested reader may consult the following for information relative to 
practices m collectmg survey data: 

Caswell, H L “Survey Techniques,’* Educational Administration and 
Supervision, 19 431-41, September, 1933 

Davis, B C “Methods and Techmques Used in Surveying Health and 
Physical Education m City Schools,” Teachers College, Columbia University 
Contributions to Education^ No 515 New York, Bureau of Publications, 
Teachers College, Columbia University, 1932 162 pp. 
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homes, current theories m education; recognized objectives; 
numbers and titles of books in high school libraries; styles of 
architecture of school buildings, practices in schedulmg recita- 
tions; and provisions of teachers’ contracts The sources of data 
include published materials such as books, penodicals, and 
monographs in the field of education and unpublished materials 
such as school records. They include school people such as 
superintendents, principals, and teachers from whom data may 
be collected by correspondence, by interview, or by observation. 
They include pupils from whom the data may be collected by 
the administration of tests or questionnaires, or by observation 
and interview. School buildings, their equipment, and their 
environment may also be included as sources of data collected 
by observation. 

The basic techniques employed in collecting data were con- 
sidered in Chapter III, but a few points may be emphasized 
here. When data are copied from records or published reports, 
the investigator should inquire into their accuracy and make 
certain of the precise meaning of the label of each item. Inter- 
viewers and observers should receive training before beginning 
to collect data for a survey Training is also desirable when 
analysis is being employed If possible, at least a portion of the 
analysis should be checked by a second worker. Questionnaires 
should be constructed with care. If the investigator is inex- 
perienced, he should secure the criticism of competent persons 
and if possible try out the questionnaire by submittmg it to a 
representative group not included in the survey. In selecting a 
test, its function should be noted. If it is necessary to construct 
a test, the advice of competent persons should be sought and an 
attempt should be made to determine the function of the com- 
pleted instrument. In admimstering a test, the instructions 
should be followed unless deviations are considered desirable. 
When this is the case, changes in the instructions should be 
determined and reduced to writmg. 

Secttring representativeness of data in survey investigations. 
When the scope of a survey is comprehensive, it may not be 
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feasible to collect data for the entire population designated by 
the problem In such cases some process of sampling must be 
resorted to. A sample may be regarded as satisfactory to the 
extent that the data obtamed are representative of the specified 
population or area. Hence, the investigator faces the problem 
of obtaining a highly representative sample. 

Occasionally a method of random samplmg may be employed. 
A random sample is not representative except by chance, but the 
probable deviation from representativeness may be calculated. 
When a method of random sampling is not employed, the in- 
vestigator may be able to secure a highly representative sample. 
If the larger population or universe is stratified, a selection 
should be made from each stratum. The size of the samples 
should be proportional to the size of the respective strata. For 
example, if data are to be collected from school systems of vary- 
ing size and there are many more small systems than large ones, 
the number of small systems investigated should be propor- 
tionally greater than the number of large ones. However, if the 
small systems are more homogeneous in character than the 
large ones, the per cent of small systems investigated will not 
need to be as large as the relative numbers of large and small 
systems would mdicate. These principles can be applied in the 
collection of various types of data. For example, when text- 
books are analyzed and samphng must be employed, the size 
of the samples should be proportional to the space given to 
important topics, lengths of the texts, number of texts in differ- 
ent subjects of the problem, and so on, according to the require- 
ments of the problem. A text of limited range of vocabulary 
requires a relatively smaller sample than a text whose vocabulary 
range is extensive. 

When the data are collected by means of a questionnaire, the 
selection is determined by influences that the investigator can 
control only indirectly. When observation or interview is em- 
ployed, accessibility is likely to be a determining factor. In 
such cases detennmation of the degree of representativeness is 
an important subordinate problem. There is no single technique 
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for accomplishing this. When there is justification for expecting 
that the frequency distributions of a representative sample will 
approach the normal shape, this criterion may be applied. The 
construction of a frequency polygon or histogram will be suffi- 
cient in many cases to indicate whether or not an approximately 
normal distribution has been obtamed. When a more precise 
determination is desired, Pearson's test may be applied.^ 
Thomson and Pintner have suggested that ‘^one criterion for 
the unselected nature of any group of children is that the 
coefficient of correlation of age and mental age m such a group 
should be approximately equal to the ratio of the coefficients of 
vanability of chronological and mental ages." ^ It should be 
noted, however, that in certain cases a normal distribution 
should not be expected For example, a representative sample 
of ^'gifted" children should yield a distribution of intelligence 
quotients far different from a normal one. 

The representativeness of newly collected survey data may 
frequently be tested by comparison with similar data reported 
in such published sources as the Federal CensuSy the Biennial 
Surveys of Education of the United States Office of Education, 
and reports of national, state, or city surveys In making such 
evaluations of the representativeness of new data, the meaning 
and comparability of units should be noted. The possibility of 
changed conditions since the publication of the criterion should 
also be considered. 

In the study of trends, smaller samples may be adequately 
representative of conditions in the earlier years, if the passage of 
time has resulted in increases of populations and in their hetero- 
geneity. In considering the representativeness of pupil popula- 
tions over a period of years, the operation of selection must be 
recogmzed as a significant factor. Thirty years ago the high 

1 See Chapter IV, page 79. 

2 Thomson, G H , and Pmtner, Eudolph. “Spurious Correlation and Rela- 
tionship between Tests,” Journal of Edumtional Psychology^ 15* 433-44, October, 
1924 For an application of this test, but not m a survey study, see Heilman, 
J D “Factors Determining Achievement and Grade Location,” Journal of 
Gmetic Psychology, 36 439-40, September, 1929 See page 234 for the formula 
for the coefficient of variability. 
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school population was highly select in comparison with the 
children of high school age of that time. At present, approx- 
imately one child of high school age out of every two children 
IS in high school. If a group of school children is studied over a 
period of years, elimination from school will decrease the extent 
to which the group is representative of children ^^in general.^^ 
In studying trends in the development of traits, different groups 
are frequently selected from the different grade levels. These 
groups may not be comparable in representativeness to a single 
group studied over a period of years. 

Computing derived measures^ A ratio, frequently expressed 
as a per cent, is a very common derived measure. For example, 
one may wish to express the score of a pupil on a spelling test in 
terms of the per cent of words he has spelled correctly. It may 
be desired to determine what per cent of educational expendi- 
tures is for teachers^ salaries. A ratio may be computed for any 
convement base. “Pupils per thousand population,'^ “pupils 
per teacher," “expenditures per pupil in average daily attend- 
ance," “school indebtedness per capita," and “language errors 
per thousand words of written expression" are illustrations. 
In his appraisal of secondary school commercial education, 
W. R Odell calculated the ratio of public school enrollments 
in certain commercial vocational subjects to the number of 
workers in the vocations as shown by the Census of 1930. ^ 

The calculation of ratios is simple, but attention should be 
given to the basic data, especially when comparisons are to be 
made If two cities are to be compared with respect to per cent 
of educational expenditures devoted to teachers' salaries, the 
term “educational expenditures" should have the same mean- 
ing for both cities, i.e., the same items of expense should be 
included. It should not mclude in one case “interest on bonds" 
and exclude it in the other. Furthermore, the term “teacher" 
should have the same meaning in both systems. It should not 

^ For the calculation of age scores, T-scores, percentile scores, and the like, 
consult the Index of this volume 

2 Odell, W. R "‘An Appraisal of Secondary School Commercial Education,” 
Teachers College Record, 34* 43-52, October, 1932 
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mean in one case merely classroom teachers and include in the 
other case elementary principals who devote some of their time 
to teaching. Hence, the measures from which a ratio is to be 
calculated should be defined with precision, and if comparisons 
are to be made, the investigator should make certain that the 
measures conform to these definitions in each population umt. 

A sum or average may be desired as a derived measure. For 
example, if several achievement tests have been administered, it 
may be desired to obtain a composite measure of achievement. 
Since the scores yielded by different tests are seldom expressed in 
terms of equivalent units and from a common zero point, 
transformation of the obtained scores to a common scale is a 
logical prehmmary step.^ The transformed measures may then 
be combined as desired Frequently a special formula can be 
derived which will simplify the total process. If criterion meas- 
ures are available, multiple regression may be employed as a 
means of determining such a formula. 

Raw data are sometimes combmed to form complex index 
measures On the basis of the number of school districts, the 
average daily attendance in elementary schools, the average 
daily attendance in high schools, the density of rural school 
populations, and expenditures for transportation in rural dis- 
tricts, Mort ^ has calculated indices of educational need by 
means of a formula derived through the use of techniques of 
curve-fitting Burns ^ employed a similar techmque in cal- 
culating an mdex of transportation need. An index measure of 
school building utihzation has been devised by Morphet.^ For 

^ See pages 82-83 

2 Mort, P. R The Measurement of Educational Need. A Basis for Distribut- 
ing State Aid,” Teachers College, Columbia UniversUy Coniributiom to Educatwn, 
No 150 New York Bureau of Publications, Teachers College, Columbia Uni- 
versity, 1924 85 pp 

2 Burns, R, L “ Measurement of the Need for Transporting Pupils,” Teachers 
College, Columbia University Contributions to Education, No. 289 New York: 
Bureau of Publications, Teachers College, Columbia University, 1927 61 pp. 

^ Morphet, E L “The Measurement and Interpretation of School Building 
Utilization,” Teachers College^ Columbia University Contributions to Education^ 
No 264 New York Bureau of Publications, Teachers College, Columbia Uni- 
versity, 1921, pp 21 f. 
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use at the college level, Reeves and Russell ^ proposed as a 
weighted index of teaching load'’ the sum of the ratio of an 
instructor's ^Heaching hours" to the average for his institution, 
the ratio of his ^^preparation hours" to the institutional average, 
and two times the ratio of his ‘^student hours" to the institu- 
tional average. An mdex measure for textbooks has been 
proposed by Patty and Pamter^ and a '^student ability 
mdex" by Banker.® In 1920, Ayres ^ proposed as an index 
measure for state school systems a composite of ten different 
elements of which five are measures of the amount of education 
received by children and five are measures of the expendi- 
tures made to purchase this education. Other indices are to 
be found in studies reported by Burgess,® Norton,® and Clark.^ 

^ Reeves, R. W , and Russell, J. D College Orgamzation and Administration 
Indianapolis, Indiana Board of Education, Disciples of Christ, 1929, pp 176 f. 

For another index measure of teaching load, see Douglass, H R Organization 
and Administration of Secondary Schools, Boston Ginn and Company, 1932, 
pp. 114 f 

^ Patty, W W , and Painter, W. I. ** A Technique for Measuring the Vocabu- 
lary Burden of Textbooks,’* Journal of Educational Research, 24 127-34, 
September, 1931 

2 Banker, H J “A Student’s Ability Index from Teacher’s Marks,” Journal 
of Educational Research, 17 357-64, May, 1928 
Banker, H J. “The Practical Application of the Student’s Ability Index,” 
Journal of Educational Research, 18 282-89, November, 1928 
Banker, H J “The Student’s Ability in Higher Educational Institutions,” 
Journal of Educational Research, 26 276-83, December, 1932 

^ Ayres, L P. An Index Number for State School Systems, New York Russell 
Sage Foundation, 1920 70 pp 

Ayres published an earlier study m which the states were ranked according to 
ten characteristics Ayres, L. P A Comparative Study of Education in the 
States. New York* Russell Sage Foundation, 1912 

See also Phillips, F M Educational Ranking of States by Two Methods 
Milwaukee Bruce Publishing Company, 1925 
Phillips, F M. “Educational Rank of States, 1930,” American School Board 
Journal, 84.25-29, 29-30, 37-39, 39-40, February, March, April, and May, 
1932 

® Burgess, W R. Trends of School Costs New York. Russell Sage Founda- 
tion, 1924 142 pp 

® Norton, J. K “The Ability of the States to Support Education,” Research 
Bulletin of the National Education Association, Vol 4, No 1-2 Washington* 
National Education Association, 1926 88 pp 

’Clark, H F “The Effect of Population upon Ability to Support Educa- 
tion,” Journal of Educational Research, 14 336-39, December, 1926 

A more complete discussion is given m Clark, H F. “ The Effect of Popula- 
tion upon Abihty to Support Education,” Bulletin of the School of Education, 
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Clark ^ has discussed the uses of index numbers in education. 
He stresses their importance in four different phases of school 
administration: (1) teachers’ salaries and costs of living; 
(2) school buildings; (3) school bonds; (4) mstructional supplies. 
The procedures, by means of which such index measures are 
calculated, represent attempts to give an appropriate weight to 
certain aspects of a complex condition or phenomena. Some of 
the resulting indices are reasonably satisfactory measures, but, 
in general, an investigator who calculates indices should bear 
in mmd the possibility that the measures obtained will in- 
volve relatively large variable errors. 

An investigator may desire the ranking of the members or 
units of the population being surveyed rather than their absolute 
measures. For example, he may desire the ranks of the pupils 
of a class rather than their scores on a test. If none of the umts, 
or individuals, have the same raw measures, numbering from 
the highest to the lowest on the scale gives their respective 
ranks. If two or more individuals have the same score, or 
measure, an average rank is given as in the following example 


Kaw Measueb 

Eank 

24 

1 

20 

25 

20 

2.5 

17 

4 

15 

5 

14 

7 

14 

7 

14 

7 

10 

9 

8 

115 

8 

115 

8 

115 

8 

115 

7 

14 

5 

15 


Indiana University, Vol 2, No 1 Bloomington, Indiana: Indiana University, 
1925 29 pp 

1 Clark, H F. “Index Numbers in Educational Work,” Teachers College Bec^ 
ord, 30 453-60 February, 1929 

Tbe interested reader should consult the articles on school-bond prices, which 
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It should be noted that the final rank equals the number of 
cases, unless two or more raw measures have the same value 
at the bottom of the series. 

Another method of assigning ranks is that of computing the 
percentile rank of each measure in the series, or of estimating 
such ranks by comparison with percentile points as calculated 
or as located on a percentile curve. ^ In a simple series of meas- 
ures the percentile rank of a given one may be easily computed 
by calculating the per cent of measures below that one in the 
series. It should be noted that the order of percentile ranks is 
the reverse of ordinary ones. An ordinary rank of is the 
highest on the scale, a percentile rank of “ 1 ” means that only 
one per cent of the measures are below that measure in the 
series. 

If the raw data are in the form of rankings, transformation 
into amount scores may be effected providing the assumption 
is made that the distribution of the trait or characteristic 
approaches the normal shape in the population ranked ^ 
When each of the members of the population has been ranked 
by several judges, the technique employed in constructing a 
quahty scale may be used for obtaining average amount 
scores.® 

When the members of a population have been ranked with 
reference to several traits or characteristics or have been ranked 
with reference to a given trait or characteristic by several 

have been published by Clark since January, 1928, in the American School 
Board Journal 

For a general treatment of index numbers, consult 

Chaddock, R E Principles and Methods of Statistics Boston Houghton 
Mifflin Company, 1925, pp 175-206. 

Fisher, Irving The Making of Index Numbers Boston Houghton Mifflin 
Company, 1923 526 pp 

Kelley, T L Statistical Method New York The Macmillan Company, 
1923, pp 331-47 

1 See Chapter IV, page 84 

2 For a description of the method and a table to facilitate the transformation, 
see Hull, C L ‘‘The Computation of Pearson’s r from Ranked Data,” Journal 
of Applied Psychology, 6* 385-90, December, 1922 

^ For a description of the procedure, see Monroe, W. S. Introduction to the 
Theory of Educational Measurements Boston, Houghton M ifflm Company, 1923, 
pp. 133-44, 
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judges, the obtained rankings may be combined by computing 
the mean or median of the assigned ranks. One may also total 
the ranks assigned to each member of a population and use these 
sums as a basis for an average ranking. If some of the series of 
rankings are incomplete, an average rank may be computed by 
transforming each set of rankings into amount scores and then 
computing the average score. From these average scores, an 
average ranking may easily be obtained.^ 

Classification of data. If only a single frequency distnbution 
is desired, the classification of data is usually a simple matter, 
but frequently separate tabulations are desired for certam sub- 
populations. For example, if the high schools of a state are being 
surveyed with reference to the number of pupils per teacher, the 
investigator may desire separate tabulations for schools enrolling 
less than one hundred pupils, schools enrolling one hundred up 
to one hundred fifty pupils, schools enrolling one hundred fifty 
up to two hundred pupils, and so on He may desire also 
separate tabulations for geographical divisions of the state. In 
elaborate surveys the determination of the sub-populations 
for which separate tabulations are to be made, may require care- 
ful consideration. Survey findmgs are more meaningful when 
they are for populations that are relatively homogeneous with 
respect to significant characteristics, but a large number of sub- 
populations makes the interpretation more difficult In deter- 
mining sub-populations, an investigator should be guided by the 
definition of his problem. Sometimes the determination cannot 
be accomplished until after some analysis of the data has been 
made. The classification finally adopted will often be the result 
of the trial of a series of classifications. Sometimes a careful 
study of the previous researches m the field of the survey will 
suggest an effective classification. When the findmgs are to be 
compared with surveys previously made, the classifications 
should be similar. In general, it is wise to make the sub-popula- 


^ For a description, of the procedure and other methods, see Garrett, H E 
“An Empirical Study of the Various Methods of Combining Incomplete Order 
of Merit Ratings/* Journal of Educational Psychology, 15 157-71, March, 1924. 
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tions rather highly homogeneous. After the data have been 
tabulated, certain sub-populations may be combined if it ap- 
pears that the more elaborate analysis is not useful. 

When the sub-populations are determined by variations in a 
single characteristic, the data may be tabulated in the form of 
a correlation table. The age-grade table is an illustration. If 
salaries of high school teachers are being tabulated, sub-clas- 
sification with respect to both sex and size of school may be ac- 
complished by drawing up a form for a correlation table with 
appropriate salary and size of school intervals and then dividing 
each of the columns to provide separate tabulations for the 
sexes. This form of table, however, becomes too complex when 
the number of subdivisions of a column is greater than two or 
three. 

When the data are quantitative measures such as test scores, 
teachers^ salaries, or pupil-teacher ratios, the classification and 
tabulation may be combined m a single process. But in handling 
other types of data, it may be desirable to effect a classification 
before the tabulation is attempted. For example, consider a 
survey of the participation of college students in extra-curricular 
activities in which the raw data mclude the names of the activi- 
ties each student participates m. Before attempting to tabulate 
this information, its classification should be decided upon and 
the one or more classes in which each student falls should be 
marked on his record. For a given activity, a student may be 
given one of the following classifications- “not participating,’’ 
“participating in the given activity only,” “participating in the 
given activity and one other,” “participating in the given 
activity and two others,” and so on. The final classification in 
this series might be “participation in the given activity and four 
or more others.” Each student would be given his classification 
for each activity considered in the survey, A more detailed 
classification would result from recognizing each combination of 
activities such as football and basketball, football and track, 
football and baseball, and the like. 

In surveys such as those of the language errors of pupils the 
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classification is made on the basis of certain criteria which fre- 
quently must be formulated by the investigator. The problem 
is similar to that involved in collecting data by means of analysis 
which was discussed in Chapter III. The particular criteria that 
should be recognized in a given case depends upon one's purpose 
An mvestigator should familiarize himself with a number of 
similar studies. It may be desirable to seek the advice of ex- 
perienced persons. 

Where data are obtained by means of questionnaires or inter- 
view schedules, the returns may be partially incomplete. Hence, 
it is usually advisable to recognize as a sub-class ^'no data" or 
'^not given." In some cases there may be justification for com- 
bining ^^no data" with ^^none." For example, if teachers are 
asked '‘How many courses in the history of education have you 
had"^" "No data" or "not given" may be taken as "none." 
It is a better practice, however, to restrict each frequency dis- 
tribution to the number reportmg specific information under the 
caption of the distribution This practice will cause some varia- 
tion in the numbers of cases represented in each distribution. 
If the number of cases is markedly different from that of the 
total population, the data relating to the given characteristic 
are probably less representative than where the per cent of 
response approaches one hundred. It may be advisable to ex- 
clude incomplete blanks from the tabulation. In any case, a 
high per cent of incomplete responses should be recognized as a 
data fault. 

The mechanics of tabulating survey data. The labor of 
tabulating survey data may be materially lessened and the 
accuracy of the results increased by giving attention to the form 
in which the data are collected or recorded, especially when the 
survey is elaborate. Usually it is advantageous to employ 
individual data cards. For example, if the survey is one m 
which there are several items of information such as chrono- 
logical age, sex, intelligence test score, average school mark, and 
vocational interests, the data for each student should be re- 
corded in a separate card. A given item of information should 
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appear in the same position on each card When the number of 
items IS large, the card should be ruled so that a space will be 
provided for each item. In many cases this ruhng may be con- 
veniently accomplished by mimeographing, but in a compre- 
hensive survey the record card should be printed. The spaces of 
the form may be labeled to indicate the items of information 
recorded in them. A convenient plan of labeling is to number 
the spaces in serial order. A general individual data card may be 
made by dividing the area into squares or rectangles of con- 
vement size and then numbermg them serially. Such a card may 
be used in various types of surveys. It may be advantageous to 
use cards of two or more colors. For example, in a survey of 
school children, white cards may be used for boys and buff ones 
for girls. 

When all of the data of a survey are collected by means of a 
single questionnaire, the returned sheets or booklets constitute 
individual records and the desired tabulations can be conven- 
iently made from them Frequently the tabulation may be fa- 
cilitated by anticipating this procedure in planning the form 
of the question blank. Similar statements may be made relative 
to the blank upon which data are recorded when they are copied 
from records or secured by other techniques. 

Sometimes the process of tabulation may be materially fa- 
cilitated by translating the raw data into a code in which the 
variations in an item of information are represented by numbers. 
In determining this code, it is necessary to anticipate the class 
intervals or rubrics in the subsequent tabulation. When these 
have been determined, they may be numbered 0, 1, 2, 3, . . . 
A given datum is then assigned the number corresponding to 
the class interval or rubric m which it falls The following are 
illustrations of coding.^ 

ipor other illustrations and a discussion of the principles of coding, see 
Toops, H. A Some Considerations Relative to the Standardization of Certain 
Procedures in Educational Research,” Journal of Expenmental Educatiout 
1 229-38, March, 1933 

A more comprehensive treatment is given by Baehne, G. W. Practical Ap~ 
plications of the Punched Card Method in Colleges and Universities New York. 
Columbia University Press, 1935 442 pp. 
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Code 

Ages op Adults 

Code 

Curricula 

Numbers 

IN Years 

Numbers 

0 

No data 

0 

Agriculture 

1 

20-24 

1 

Athletic coaching 

2 

25-29 

2 

Biological science 

3 

30-34 

3 

Chemistry 

4 

35-39 

4 

Commerce 

5 

40-44 

5 

Education 

6 

45-49 

6 

Engineering 

7 

50-54 

7 

Law and pre-legal 

8 

55-59 

8 

Liberal arts 

9 

60 and above 

9 

Home economics 



10 

Physics 



11 

Social science 


Code Numbers Participation in Football and Basketball 

0 No football, or no data 

1 Football only 

2 Football and one other activity 

3 Football and two other activities 

4 Football and three or more other activities 

5 Basketball only 

6 Basketball and one other activity 

7 Basketball and two other activities 

8 Basketball and three or more other activities 

9 No basketball, or no data 

Coding IS merely a classification of data plus a technique for 
designating the results. Hence, when coding is employed, the 
classification of data is separated from the process of tabulation. 
This is desirable in elaborate surveys, especially when the varia- 
tions to be recorded are of the type shown in the last of the pre- 
ceding illustrations. Incidentally, it should be noted that the 
employment of code numbers is hkely to reduce greatly the space 
required for recording the data on individual cards. 

In surveys of large populations, the labor of tabulating may 
be greatly lessened by employmg machines. The basis of 
mechanical tabulation is an individual record card on which the 
data in terms of code numbers are recorded by the position of 
punched holes. After the cards have been correctly punched, 
the sorting and counting is accomplished by machines.^ 

1 For a brief description of the use of mechamcal tabulation in handling school 



Bbx High School Class Age Intelligence Quotient 



Fig, 2. A tabulation sheet for four items of information, sex, high school class, age and IQ. 
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Sometimes an investigator records Ms data in the form il- 
lustrated in Figure 2. The various items of information for a 
given member of the population are recorded on a single hne. In 
Figure 2 the raw data have been classified and the information 
is recorded by means of check marks m the appropnate columns. 
The totals of the checks in the columns give the frequency dis- 
tributions for the four items of information. When the popula- 
tion is greater than the number of lines on the sheet, the sub- 
totals may be earned forward. This t3rpe of record sheet is 
inconvenient when separate tabulations of an item are desired 
for the categories of another item. For example, if one wishes to 
obtain the distribution of intelhgence quotients of sophomore 
girls, one must identify the checks that refer to these students 
and then make a tabulation on a separate sheet. The labor of 
doing tMs may be lessened to some extent by makmg a prelim- 
inary sorting before recording the data so that the information 
relative to sophomore girls will appear on consecutive Imes. 
There is, however, a limit to such preliminary sorting. Hence, 
the type of record sheet shown in Figure 2 is seldom to be 
recommended. 

Calculations from tabulations. The results of elaborate 
manipulations of data are usually difficult to interpret. Hence, 
in making calculations from tabulations, the investigator should 
use as simple techniques as are consistent with the demands of 
his problem and the nature of Ms data. Elaborate procedures 
resulting in precisely expressed statistical measures frequently 
lend a false air of dependability to the results obtamed. Un- 
critical readers of a report of research are hkely to accept at 
^'face value” statistics whose calculation is not understood by 
them. The investigator should not, however, over-simplify the 
statistical treatment of Ms data. For example, an average gives 

data, see Hugg, H. O. Statistical Methods Applied to Education^ Boston. Hougli- 
ton Mifflm Company, 1917, pp. 66 f 

Information concermng the various types of tabulating machines may be 
secured by addressmg the International Business Machines Corporation, 
Tabulating Machine Division, 270 Broadway, New York City. This company 
maintains offices in a number of other cities. 
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only the central tendency of the conditions or practices It does 
not reveal variations from this central tendency and they may 
be as significant as the average status. Measures of skewness, 
coefficients of variability, and other infrequently used statistics 
may be helpful m indicating the meaning of survey data 

The techniques for calculating means, medians, modes, 
quartile and percentile points, and measures of variabihty w^ere 
described in Chapter IV. When the distribution does not ap- 
proximate the normal shape or when the number of cases is 
small, care should be exercised in mterpreting these statistics as 
summary measures For example, if the distribution exhibits 
extreme skewness, the mean may be misleading as a central 
tendency. When the number of cases is small, percentile points 
and even quartile points should not be considered precise 
determinations. In some cases the calculation of these statistics 
may not be advisable. 

In converting the frequencies of a distribution into per cents, 
a convenient procedure is to calculate the reciprocal of N, or the 
base The desired per cents are then obtained by using this 
quotient as a multiplier of the frequencies. The products may 
be read from an appropriate table or obtained by employing a 
calculating machine. In the latter case the reciprocal is set in 
the keyboard and, using the ordinary procedure of multiplica- 
tion, the frequencies are made to appear successively m the upper 
right dial of the machine and just below each frequency, as it ap- 
pears, IS the corresponding per cent. The summation of the per 
cents should equal 100., or 100 00 if two decimal places are ear- 
ned, but, because of roundmg off of decimal places, it may 
not quite equal this figure. Chaddock ^ advocates, on logical 
grounds, the reporting of the sum as 100.00 even in cases where 
it is not precisely that amount. Adjustment of certain per cents 
so that the total exactly equals 100 reduces their accuracy and is 
undesirable when the adjusted per cents are given individual 
interpretations. 

^ Chaddock, R E. Pnnaples and Methods of Statuses Boston Houghton 
Mifflin and Company, 1925, p 412 
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When there is only a single distribution, the base to be used 
in calculating the per cents is usually apparent, but when there 
are several distributions and comparisons to be made, the de- 
termination of the bases to be used may require careful considera- 
tion m order to obtain per cents from which the desired inter- 
pretation may be derived. If no report or usable information 
has been obtained from some members of the sub-populations 
of a survey, the investigator must choose between the “total 
number in a sub-population’’ and the “number reporting usable 
information.” Usually the latter is preferred as a base in calcu- 
lating per cents. Another case that requires attention is when 
certain individuals or umts are mcluded in two or more fre- 
quencies of the same distribution. For example, many of the 
students in a group may report participation in more than one 
extra-curricular activity. If the total number of students re- 
porting is used as the base, the per cents will total more than 
100 ^ However, this is not a matter of concern if the fact of 
multiple participation is noted in presentmg the data in a table. 
The statement that 75 per cent of the students participate in 
intramural athletics and 35 per cent participate in extra-mural 
athletics is not likely to be misleading. Such statements are 
applicable to other types of data where the response is multiple 

In combining per cents from several groups, the items should 
be weighted in terms of the sizes of their respective populations. 
The simple average of 15, 25, and 50 per cent is 30 per cent 
Suppose, however, that the three populations are respectively 
100, 500, and 1000. Then 

100 X 15 = 15 
500 X 25 = 125 
1000 X 50 = 500 
1600 640 

and 640 is 40 per cent of the cases in the total population. 

When frequency distributions are expressed in different scale 

^ In such a case the sum of the per cents is without meaning and, hence, should 
not be given in the table. 
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units, comparison of their relative variabilities may be made 
by calculating coefficients of variability. The formula is 

M 

If one distribution has a mean of 50 and a standard deviation 
of 15, its coefficient of variability is 30. The relative variability 
of this distribution is the same as one having a mean of 5 and 
a standard deviation of 1.5. The coefficient of variability of 
both distributions is 30. It should be noted that in using this 
formula the assumption is made that the zero points on the 
scales of the compared distributions are true zero points Hence, 
the use of the formula is justified to the extent that this assump- 
tion is satisfied. However, we make the same assumption in 
the use of the mean itself. If one has calculated the quartile 
points of a distribution, the following formula may be used 
to calculate a coefficient of variability. 

T. 100(^3 - Ql) 

&+«. 

If the distribution is symmetrical, — ^ is equal to the me- 

dian deviation of the distribution and is equal to the 

median. When this condition is satisfied, the formula becomes: 
„ lOOikfdD 

When the population from which data have been secured is 
not large, a frequency distribution is hkely to exhibit marked 
irregularities that are not characteristic of a large population 
or universe. Hence, if the data collected are used as a basis for 
generalizing in regard to the shape of the distribution, it is 
desirable to estimate the probable shape for a very large popula- 
tion or universe. One method is to add each frequency to its 
adjacent ones and divide the resulting sums by three as in the 
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following illustration. The frequencies at each extreme are 
doubled, added to their neighboring frequencies, and the re- 


suiting sums divided by three. 

/ 

/ (Smoothed) 

50 

1 

2 

45 

4 

3 

40 

4 

3 

35 

1 

4 

30 

7 

5 

25 

7 

6 

20 

4 

5 

15 

4 

4 

10 

4 

3 

5 

1 

2 

0 

1 

1 


The method is known as that of moving or rolling averages. 
Sometimes the frequencies are averaged in groups of five, seven, 
or nine. When a distribution has been treated m this way, it no 
longer represents the measures actually secured. High fre- 
quencies are decreased and low ones increased If the method 
is used injudiciously, it may elimmate significant features. If 
used wisely, the result may be a distribution which better 
represents the true conditions in that some of the errors, or 
fluctuations due to chance or other extraneous causes, have 
been reduced. The method of roiling averages is much used in 
dealing with historical statistics, i.e , the statistics of trends. 
It is used in business statistics to eliminate seasonal or short- 
time cyclical variations from time series in order to reveal 
trends over longer periods of time. 

Techniques for comparing frequency distributions. On pages 
112-13 attention was called to the hmitations of an average. 
Both the variability and any irregulanties in the distribution are 
neglected. Hence, unless the distributions of the groups of data 
being compared are approximately equivalent in shape and 
variability, the use of only the central tendencies may result 
in an interpretation that is misleading or even erroneous. Even 
when the distributions are approximately equivalent in shape 
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and variability, a ^ 

be secured by compal" wiU ^ually 

may be compared by ^ distnbutions. Two distributions 

or exceeds the ceatl?, f 

also be used as a mel Tu 1 

“““■ ^ taa been detoed byX formula ■ 


n.is = 


_ Mj — Ml 2i 

<X 2 


in which ilfi and Ar„ „ at o , , , 

<r the standard devils'" distnbutio^, 

combimng the two ^ distribution formed by 

total distribution w^r P is the per cent of the 

which M, IS calculated '^stnbution from 

normal probability ^ ® 

of the total area ^ ® corresponding to the per cent p or g 


^This formula is given k t 

The Macmillan Company iq T. L. Statzstical Methods New York* 

For other formulae see M ^ 

Method for Computing th^ q ^ ^ Dunlap, J W “A Graphical 

mental Education, 2 274 .^ 7 ^ Error of Bi-serial r,” Journal of Experv^ 

2 p + g = 1 00 in the 

tamed from an appropriate + formula, not 100 The value of z may be ob- 
use q) subtract 5 from it ^ greater than ,50 (if it is not, 

Statistical Tables for Stude / value find z in Table XII of Holzinger’s 

table appearing as Append ^ Education and Psychology The Kelley-Wood 
Bi-senal r is also used to Kelley’s Statistical Method, may also be used 

success on a given test correlation between success and non- 
general, to show the correl^^^® criterion measures (see page 184) or, in 

another trait or character! between a trait measured quantitatively and 
It is assumed that the distr K ^ is classified in two qualitative categories 

categories is normal and th f characteristic underlying the qualitative 

non-success on a test exercia^^ regression is linear In the case of success and 

refer to the sub-senes of general, the means used m the above formula 

gories, (T refers to the total measures classified under the two cate- 

the proportions of cases senes, and p and q refer respectively to 

rubric being divided by jy category, the total frequency under each 

the calculations, see Holzifi ^ total quantitative series For an illustration of 
tion Boston Ginn and C ^ Statistical Methods for Students in Educa- 
Kelley, T L Op ci£ , 

For a formula convenient + 

ograph, see in analysis of test items, a table, and a nom- 

Dunlap, J W “ Note on ^ t 

nation” Psychometnka, 1 ri ^^P^fntions of Bi-senal Correlations m Item Eval- 
'^^f . June, 1936. 
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Symonds ^ has compared measures of overlapping. For the 
case when the two distributions are normal and represent equal 
populations, he gives a table of the comparable values of (1) 
bi-serial r, (2) difference of the means of the distribution in 
terms of the standard deviation of one of them, and (3) the 
proportion of one distribution that is above the mean of the 
other The use of bi-serial r is recommended. 

Techniques involved in studjing growth or trends. In study- 
ing the growth of the average achievement of a class, a survey 
is made at intervals over a period of months or years. The 
findings from such a series of surveys are frequently referred 
to as a time series. The interpretation is accomplished by com- 
parison. If the average achievements of the class are repre- 
sented as ordinates and the time intervals as abscissa distances, 
the curve joimng the points thus located will represent the 
growth of the achievement of the group. The principal con- 
sideration m such studies is that the successive surveys be 
comparable. In studies of the trend of such phenomena as 
school enrollment, the data for surveys of past dates must 
necessarily be secured from records. In case there has not been 
uniformity in recording the data, the investigator faces the 
problem of estimating the adjustments necessary to make the 
measures comparable. Sometimes the growth in achievement 
or some other characteristic of children is studied by securing 
measures of a series of groups of children now in school. The 
possibility that these groups may not be comparable in certain 
significant respects makes it desirable to avoid this procedure 
when possible. 

When the average rate of increase between two dates is 
desired, it cannot be obtained by dividing the total per cent of 
increase by the number of time intervals. It is necessary to 
calculate the geometric mean. Suppose for example that the 
enrollment of a given high school was five hundred m 1920 

1 Symonds, PM *‘A Comparison of Statistical Measures of Overlapping 
with Charts for Estimating the Value of Bi-serial r” Journal of Educatwnal 
Psychology f 21, 586-96, November, 1930. 
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and nine hundred in 1930. The total increase is 80 per cent of 
the initial enrollment. The average rate of increase for the ten- 
year period is not 8 per cent. Let r designate the average rate 
of increase and Po the enrollment in 1920. The enrollment in 
1921, Pi, will be equal to Po(l + r). Similarly P2, the enroll- 
ment in 1922, will be equal to Pi(l + r) which by substitution 
becomes Po(l + r)^. Continuing this reasoning we obtain 
Pio = Po(l + This equation may be solved for r by em- 
ploying logarithms. 

log Pio = log Po + 10 log (1 -t“ r) 

log (1 + ,) . 


Substituting the given values of Po and Pio and emplo 3 dng a 
table of logarithms, the value of r is found to be .0605 or 6.05 
per cent, the average annual mcrease in the enrollment. 

The mathematical relationship involved in the equation 
Pio = Po(l + is that of compound interest. The final 
equation may be generalized by substituting n for 10. 


log (1 + r) = 


log Pn - log Po 
n 


If the average rate of decrease is sought, the signs of the terms 
in the numerator of the fraction are changed. 

School enrollments and census data over a period of years 
are frequently studied to determine some means for forecasting 
future enrollments Several techniques ^ have been developed 
and employed in school surveys. Forecasting the future from 
the past is based upon the assumption that the trend of the 
past will be continued in the future. This is likely to be approx- 
imated, provided no new factors are introduced into the situa- 

1 For a brief description of the techniques, see 

Engelhardt, Fred. '‘Forecasting School Population,” Teachers College, 
Columbia UmversUy Contributions to Education, No 171. New York Bureau 
of Publications, Teachers College, Columbia University, 1925. 66 pp 

Chamberlam, L M,, and Crawford, A B. “The Prediction of Population and 
School Enrollment in the School Survey,” Bulletin of the Bureau of School Serv- 
ice, Vol. 4, No. 3. Lemngton, Kentucky. University of Kentucky, March, 3932. 
27 pp. 
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tion. New laws relating to the school attendance, changes 
in the industrial activities of the community, and the like 
are likely to affect school enrollments and hence make the 
predictions less satisfactory than they would otherwise be. 
Chamberlam and Crawford ^ compared forecasts of school 
enrollments made in thirty-five city surveys with the actual 
enrollments in 1930-1931. Although the earliest of these fore- 
cmts was made in 1920 and the median date is 1924, the 
weighted mean error is 11.49 per cent. These authors rec- 
ommend the use of simple and direct methods of forecasting.^ 

Graphic methods. The use of graphs to give meaning to 
summarizations of data is characteristic of reports of survey 
research, but a comprehensive treatment of the subject is 
unnecessary here since there are several excellent sources of 
information with respect to graphic methods.^ Frequency 
polygons, histograms, or column diagrams, smoothed frequency 
curves, and fitted ones are used in portraying frequency distri- 
butions. Line graphs are useful in indicating trends in enroll- 
ment, costs, expenditures, salaries, and the like, and in illus- 
tratmg the development of traits.^ The typical curves used to 

^ Op cit , p 22 

2 The reader interested in further study of this topic should consult Chaddock, 
op ad , Chapter XIII or Smith, J G Elementary Statistics New York: Henry 
Holt and Company, 1934, Chapters XI to XV The treatment of the techniques 
m these sources is with reference to the field of economics in which the study of 
trends and forecasting are important topics 

® See any standard text in statistics or educational statistics, or for more com- 
prehensive treatments see 

Alexander, Carter School Statistics and Publicity New York Silver, 
Burdett and Company, 1919, Chapters IV and XI 

Brmton, W C. Graphic Methods for Presenting Facts New York The 
Engmeermg Magazine Company, 1914. 371 pp 

Karsten, K G Charts and Graphs, New York Prentice-Hall, Inc., 1923. 
724 pp 

Williams, J. H. Graphic Methods in Education Boston. Houghton Mifflin 
Company, 1924. 319 pp 

For an excellent discussion of the precautions to be observed in drawing graphs 
see Chaddock, op ad , Chapter XVI. 

* In some situations a logarithmic graph is helpful. See 

Allen, C B “Logarithmic Charts,” Educational Administration and Super* 
vision, 20 583“91, November, 1934 

Allen, C. B “Rate of Change vs. Absolute Change in School Enrollments,” 
Educational Administration and Supervision, 20*431-37, September, 1934. 
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show how mental age increases with chronological age are 
examples of the latter. The familiar bar-graph is a modification 
of the histogram m which the columns are separated and are 
usually colored black. Frequently, the bars run horizontally 
from the vertical axis. The bars may represent frequencies in 
categories of an unordered series, and the order of the bars 
may be in terms of increasing or decreasing frequencies in the 
categories Sometimes, bars in contrasting symbolism are 
drawn in pairs, or in threes. For example, if one wishes to 
present graphically data relative to the participation of boys 
and girls in extra-curricular activities, pairs of parallel black 
and white bars may be used for each of the activities repre- 
sented m the data. The black and white bars of each pair may 
be drawn in contact with each other The frequencies should 
be represented by the lengths and not by the areas of the bars 
The bars should be in the same units, or according to the same 
scale, ^ and should be measured from the same zero point If 
the lengths of the bars are not made proportional to the fre- 
quencies in this way, the graph will be misleading. The reader 
will be given the impression that differences in the lengths of 
the bars are more significant than they really are. 

It is unwise to use circles of various sizes to indicate relative 
magnitudes. It is difficult to make accurate comparisons of 
such areas. Circles may be used, however, to illustrate the 
proportions of a population which fall into different categories. 
The area of the circle represents 100 per cent, and the sectors 
represent the per cents of each category. A circle divided in 
this way is known as a pie graph. Bars, or rectangles, may also 
be used to represent the per cents of a given sample which 
fall into different categories. When several circles, or bars, 
are used to represent several comparable percentage distribu- 
tions, the dimensions of the bars or circles should be identi- 
cal and the segments or sectors representing comparable per 

^ The nature of the scale should be indicated on the graph A good procedure 
IS to draw a line above or below the bars on which the scale values are indicated 
at intervals by short vertical lines accompanied by the values of the lower liimts 
of the intervals 
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cents should be given in the same order and in the same S3mi- 
bolism. 

The ogive or percentile graph is used in portraying cumula- 
tive frequency percentages. Percentile ranks ^ may be read 
from percentile graphs and one can easily observe the per cents 
of cases above or below the median, the first and third quartiles, 
or the decile points. One can compare two percentile curves on 
the same chart with respect to the amount of overlapping of 
the distributions, their relative variability, and their respective 
central tendencies ^ 

Graphical representations of two or more distributions may 
be reduced to comparable scales and superimposed, but if the 
number of groups of data is large, the resulting figure will be 
complex and not easy to interpret. A simpler figure may be 
obtained by representing each distribution by a straight hne 
on which the central tendency and certain percentile points are 
indicated. If these lines are arranged in parallel with respect 
to a common scale, the overlapping mil be apparent. This 
type of representation is illustrated in Figure 3, on page 242, 
which IS taken with minor changes from a Report by the Advis- 
ory Committee on College Testing in The Educational Record for 
October, 1932. In this figure the heavy portions of the vertical 
lines represent the range of the middle two-thirds of the dis- 
tribution of the scores on the English test administered to 
sophomores in the colleges represented. The narrow exten- 
sions show the range from the tenth percentile to the ninetieth 
percentile. The average score for each college is represented 
by a short horizontal hne 

iThe percentile rank of a measure is the per cent of measures below that 
measure in the distribution. The percentile point nearest which the measure 
falls may be taken as its percentile rank, but if precise values are needed, they 
should be calculated by means of a formula See Holzinger, K J Statishcal 
Methods for Studenistn Education Boston Ginn and Company, 1928, pp. 136 f 

2 For discussion of the methods employed m drawing percentile graphs, see 

Holzinger, op cit , pp 127-40. 

Odell, C W. Educational Measurement in High School New York: The 
Century Company, 1930, pp 610-15 

Otis, A S Statistical Method in Educational Measurement Yonkers-on- 
Hudson, New York. World Book Company, 1925, pp 53-67, 77-84, 95-100 
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Sigma 

Scores 



Fig 3 Bar graph of variability of English achievement in several col- 
leges The “college numbers'' designate 16 of the 138 colleges participating 
in the testing program. After Johnston, J B , e£ aZ , “The 1932 College 
Sophomore Testing Program," Educational Record, 13.306, October, 1932 

B. Repokting and Intekpreting Findings 

Reporting survey studies. In reporting a survey study an 
investigator faces the problem of deciding how much of the 
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details to include. In dealing with this problem the pnncipal 
considerations are: (1) the availability of space, (2) the prob- 
able interest of readers in details, and (3) the effectiveness of 
the report in fulfiUing its communicative function. If the survey 
is being reported in unpublished form, space is seldom an im- 
portant consideration; but when the report is to be published, 
it is usually desirable to limit frequency tables and other de- 
tails to a mniimmn. The critical reader is hkely to be interested 
in details as a means of determining the dependabihty of the 
survey, but most readers are not interested in details. They 
wish to learn the principal findings and their interpretation. 
Hence, the author should be guided by the audience for which 
he is wntmg. Comprehensive surveys covering large areas are 
more likely to command the attention of critical readers than 
mmor studies The presence of many detailed tables tend to 
interfere with a fluent reading of the report, but frequently a 
survey cannot be effectively reported without presenting a 
number of details For example, if per cents have been calcu- 
lated from the frequencies of a distribution, it is usually desir- 
able to report the absolute frequencies as well as the relative 
ones. 

Few definite rules can be stated. The last sentence of the 
preceding paragraph is one generally recognized. Another rule 
is that the details of calculation should not be reported, but, 
if the statistical procedure employed is not a standard one, it 
should be described. Usually a measure of the vanabihty of a 
frequency distribution should be given as well as its central 
tendency This is important when precise comparisons are 
being made. If samphng has been employed in collecting data 
or it is desired to generahze from the findings, information 
relative to the representativeness of the data should be given 
The calculation of the probable error of a mean or median is 
justified only when a process of random sampling has been 
employed in collecting the data or when the group of data may 
be assumed to constitute a random sample. Hence, this statistic 
should not be given unless one or the other of these conditions 
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exist. Unfortunately the probable error is frequently introduced 
in survey investigations when no sampling has occurred or 
when the sample is obviously not random. For example, in a 
study ^ m which no samphng occurred, the following statements 
are made “The arithmetic mean of the grades earned in resi- 
dence is 2 33 0,043. The mean of the grades earned by cor- 

respondence is 1.95 =fc= 0.032.’^ Interpreted literally, these state- 
ments are absurd. 

If it seems desirable to include a large number of details, 
tables that are likely to be of interest only to the more critical 
readers may be placed in an appendix. The reading of a simple 
table need not be described in the accompan 3 dng text, but 
when the meaning of the entries will not be obvious to a com- 
petent reader, their interpretation should be given. In all cases 
the captions of the table, especially those of the columns, 
should be formulated with care. 

Interpretation of survey findings. A statement of the median 
score made on a certain achievement test by the seventh-grade 
pupils in City A is of little interest. The statement that the 
median score in City A is ten points less than in City B is more 
meaningful, but usually it is desired to say that the difference 
is an index (indirect measure) of the relative achievement of 
the pupils or of the relative quality of the instruction they have 
received. Hence, the interpretation of the findings in an achieve- 
ment survey may be thought of as involving a comparison and 
usually a change in the label attached to the difference. The 
procedure of the interpretation in other types of surveys is 
similar 

Dependability of interpretations when generalkation is not 
involved. When a difference is labeled as an index (indirect 
measure) of a trait or condition, it is necessary to justify this 
interpretation. In Chapter V, the term dependability was intro- 
duced to refer to the degree of correctness of a statistic when the 

1 Larson, E L “The Comparative Quality of Work Done by Students 
m Residence and Correspondence Work,” Journal of Educaiwnal Research^ 
25 105-^9, February, 1932, 



STUDYING CURRENT CONDITIONS 245 

precise nature of its label is considered. Hence, the justification 
of an interpretation may be thought of as demonstrating the 
dependabihty of a difference when it is given the desired label. 
This may be accomphshed by showing that the difference cannot 
be satisfactorily explained as the sum of contributions from 
other possible causes. For example, justification of the inter- 
pretation of a difference between mean test scores as an index of 
relative achievement is accomplished by showing that it is very 
improbable that the difference is due to the effects of data faults. 
If the interpretation is in terms of relative quality of instruc- 
tion, it is necessary to consider also the quality of the pupil 
material and the amount of instruction to which they have been 
subjected. If the interpretation is generalized, attention must be 
given to the representativeness of the groups tested. 

The details of demonstrating the dependability of an interpre- 
tation vary with the nature of the data and the desired in- 
terpretation, but the general procedure may be illustrated by 
considering the interpretation of differences between mean test 
scores as measures of relative achievement. The effect of variable 
errors upon a mean varies inversely writh the square root of the 
number of cases. Hence, the contributions from variable errors 
of measurement and variable errors of validity will be small, 
and if the number of cases is large, they may usually be con- 
sidered negligible. This means that the unrehability of the 
test and its lack of validity, as the term is usually defined, are 
relatively unimportant in surveys and consideration of them 
may be omitted without seriously weakening the argument. 
Frequently an investigator points with pride to the coefficient 
of reliability of the test used and thereby gives the impression 
that consequently his interpretation is highly dependable. This 
is unfortunate because variable errors of measurement wffiose 
magnitude is indicated by a coefficient of rehability usually 
make only a minor contribution to a difference between means 
or medians. Similar statements may be made with reference to 
indices of validity. 

The contributions from systematic errors of measurement may 
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be large As pointed out m Chapter V, there is no definite 
procedure by means of which the magnitude of the systematic 
error of measurement in a group of test scores may be deter- 
mined We may, however, obtain some mdication of their 
magnitude by inquiring concerning the testing conditions which 
include the explanation of the test to the pupils, the attitude of 
the person admmistermg the test, the number of minutes allowed 
for responding to the exercises, and other aspects of the ad- 
ministration of the test that influence the mean score of a group. 
If it does not appear that the testing conditions were the same 
for the populations tested, it is likely that a portion of the 
difference is due to a systematic error of measurement. 

When the label attached to the difference or implied in its 
interpretation specifies an achievement other than that measured 
directly by the test, the possibility of a systematic error of 
vahdity is created. It is seldom that the achievement directly 
measured by a test is the achievement whose specification is 
desired in the interpretation For example, a sentence dictation 
spelling test measures directly the ability of the pupils to spell 
certain words under the conditions of the test. We usually 
desire, however, measures of the ability to spell the words used 
in typical writing. Hence, when scores from a dictation spelling 
test are labeled measures of spelling ability’’ the possibility of 
errors of validity is created The ability to respond to a silent 
reading test is not the same as the ability that functions m 
typical reading. The ability to identify statements that are 
true and those that are false is not what we usually mean by 
achievement in a school subject. In addition to the substitutions 
implied by these statements, we frequently substitute measures 
of a sample of achievement for measures of the total achievement 
in a school subject or in a specified segment of it. The tests 
that we designate as measuring achievement actually measure a 
combination of achievement and general intelligence. 

In a given situation the mean status of what the test measures 
directly bears a certain ratio to the mean status of the achieve- 
ment whose measurement is desired. If this ratio is the same 
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for two populations, the change of label made by the inter- 
pretation will not introduce a systematic error For example, 
under a given curriculum and plan of instruction in arithmetic, 
the average calculation achievement of a group of pupils bears a 
certain ratio to their average total achievement in arithmetic 
If the same balance of curriculum and instruction prevails in 
two populations, the difference between their mean scores on a 
calculation test may be labeled ^Mifference in arithmetical 
achievement without introducing a systematic error of validity. 
If, however, the curriculum and general plan of instruction are 
different in the two populations, a systematic error of validity 
is likely to be introduced when the difference between mean 
scores on a calculation test is labeled difference in arithmetical 
achievement 

If the difference between mean test scores is interpreted as an 
index of the relative quality of instruction, it is necessary to 
consider also the equivalence of the two groups with reference 
to capacity to learn and the amount of instruction received 
Measures of capacity to learn are pro\uded by intelligence test 
scores, but they may involve a systematic error. Hence, in 
demonstrating the equivalence of the groups compared by intro- 
ducing scores on an intelligence test, it is necessary to inquire 
concerning the possibility of a systematic error m these measures 
In determining the amount of instruction, it is necessary to con- 
sider incidental instruction as well as that for which explicit 
provision is made. 

Comparisons with norms are especially difficult to interpret 
because so little information is provided concermng the testing 
conditions and the populations from which the norms have been 
derived A grade population is indefinite, especiaUy at the high 
school level. Assuming that the population from which the 
norm was denved is typical of the grade with respect to general 
intelhgence and previous traimng in the general field of the 
test, it may not be typical with reference to the acquaintance of 
the pupils with testing procedures m general or with respect to 
tests of the type admimstered. Hence, the dependability of 
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interpretations involving comparisons with norms is usually 
uncertain One writer has presented evidence to show that ^^all 
but a very few of the norms now available for high school tests’^ 
are “practically worthless for the evaluation of school achieve- 
ment on a relative basis ” ^ 

Dependabilily of generalized interpretations. Frequently, 
only a sample of the population specified by the problem is 
measured or it is desired to generalize from the findings of a 
survey If the sample is perfectly representative, the dependabil- 
ity of the interpretation will not be affected when the findings 
are labeled as measures of a larger population or universe. If 
the sample is not representative, this condition may contribute 
an additional error to the findings as measures of the larger 
population or universe. 

Sometimes it is possible to present evidence to show that a 
sample is highly representative For example, in his study of the 
ability of ninth-grade pupils to read typical hterary selections, 
Irion 2 was able to show that the group tested was typical of 
mnth-grade pupils in general. Hence, generalization of his in- 
terpretations did not materially affect their dependability 
In many cases, the population surveyed is obviously not 
representative or the method of collecting the data is such that 
non-representativeness is highly probable. When the per cent of 
returns in a questionnaire study does not approximate one 
hundred, the data are frequently not representative. What is 
likely to happen is illustrated by a study in which a questionnaire 
was mailed to the graduates of the College of Education at the 
University of Illinois for the purpose of ascertaining the propor- 
tion of the graduates who enter and contmue m educational 

^ Lindquist, E F “ Factors Determining Reliability of Test Norms,’’ Journal 
of Educational Psychology, 21:612-20, October, 1930 

Another pertinent reference on this point is Stokes, C N , and Finch, F H. 
“A Comparison of Norms on Certain Standardized Tests in Arithmetic,” 
Elementary School Journal, 32 785-87, June, 1932 

* Irion, T W H. ” Comprehension Difficulties of Ninth Grade Students in 
the Study of Literature,” Teachers College, Columbia University Contributions to 
Education, No 189 New York Bureau of Publications, Teachers College, 
Columbia University, 1925 116 pp. 
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work. Returns were received from approximately 53 per cent 
of those graduatmg during the period from 1920 to 1930. It 
seemed reasonable that graduates engaged in educational work 
would be more likely to respond to the questionnaire than 
those not so employed. This hypothesis was supported by data 
obtained from the Alumm Directory. Hence, the per cent of 
those reporting who were engaged in educational work was 
greater than the proportion of all graduates engaged in educa- 
tional work When the nature of the selection is known, as in 
this case, it may be possible to make estimates that will be more 
dependable than the results obtained from the data. 

Occasionally, it is possible to employ a technique of ran- 
dom sampling in collecting survey data and in certain situations 
the assumption of a random sample may be justified. In such 
cases probable or standard error formulae may be employed 
as a means of calculatmg the probable effect of chance non- 
representativeness.^ It is a common practice to compare a 
difference with its probable error as a means of determimng its 
statistical significance It is called statistically significant if the 
probability of the effect of chance non-representativeness being 
large enough to change the sign of the difference is very small, 
usually not more than 3 or 4 out of 1000.^ The statistical 
significance of a difference has nothing to do with the contribu- 
tions from systematic errors or other causes. Hence, a statis- 
tically significant difference is not necessarily dependable. It is 
unfortunate that the use of the probable error and its derivative 
the critical ratio is frequently discussed under the head of 


^ See pages 104-05 for formulae 

2 This probability is conveniently obtamed by means of the critical ratio 
{CR) which IS formed by dividing the difference by its probable error. Tables 
have been prepared which give for various values of the critical ratio the prob- 
abilities that the difference for the universe will have the same sign For example, 
see Garrett, H. E. Statistics in Psychology and Education New York Long- 
mans, Green and Company, 1926, p 135 
The term critical ratio was first used in McGaughy, J R “The Fiscal Ad- 
ministration of City School Systems *’ New York The Macmillan Company, 
1924, p 9 The concept of the critical ratio, however, had been employed by 
mathematicians for some time The experimental coefficient (see Chapter IX, 
page 308), proposed by McCall, is based on the same principles 
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reliability, thereby giving the impression that it is only necessary 
for an investigator to consider the probable error when seeking 
to determine the dependability of survey findings 

It is seldom feasible to employ a technique of random sam- 
pling in educational research ^ Hence, usually it will be necessary 
to justify the use of probable or standard error formulae by 
showing that the groups of data may be regarded as random 
samples. The formulae should not be used when the sample 
can be shown to be highly representative or when the evidence 
mdicates that it is non-representative. When a survey is made 
within a school system or within a limited area, it is seldom 
justifiable to assume that the data collected constitute random 
samples of school systems in general. If the high school princi- 
pals of a state are invited to admimster a general achievement 
test to their twelfth-grade students, it is likely that a large 
proportion of the favorable responses will be received from the 
better schools Hence, it should not be assumed that the group 
of schools admuiistenng the test form a random sample of the 
state. On the other hand, it may be argued that the twelfth- 
grade group within a given high school may be considered a 
random sample of the twelfth-grade students in this school over a 
period of years, provided no systematic influence is apparent. 
If the rules relative to age at entrance to the first grade, the 
promotion policy, and other conditions affecting the age of 
high school semors have not varied, the group may be con- 
sidered a random sample relative to chronological age It should 
not be considered a random sample relative to school achieve- 
ment if significant changes have been made in the curriculum 
or other conditions intended to increase the achievement Al- 
though the assumption that a sample is random may appear to 
be justified, unobserved factors may make the sample a non- 
random one. Chaddock ^ has pointed out that the application 
of probable error techniques is probably not justified in the 

1 For an elaboration of tins point, see pages 58-59 and 109 

2 Chaddock, R E “Significance of Infant Mortality Rates for Small Geo- 
graphic Areas,” Journal of American Statistical Association, 29 243-49, Sep- 
tember, 1934. 
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case of infant death rates for adjacent areas. In other words, 
the numbers of infants dying dunng a given year do not appear 
to be random samples of the number of infant deaths over a 
period of years. Chaddock describes a case in which determina- 
tion of statistical significance resulted m an erroneous inter- 
pretation. 

Further consideration of the dependabilily of interpretations* 
Interpretation may be thought of as an attempt to explain 
survey findings in terms of their causes. As an illustration, con- 
sider a statewide or regional survey in which a test or battery of 
tests is administered to the twelfth-grade pupils in the high 
schools of the area. The mean scores of the schools will usually 
exhibit a wide range of variabihty such as is shown in Figure 3 
on page 242. Lindquist ^ has reported that in a statewide survey 
of achievement in Iowa high schools, the variabihty of the mean 
scores was approximately half of the variabihty of the individual 
scores. Such findings suggest similar differences in mean achieve- 
ment and in school efiSciency, but this h 5 npothesis must be tested 
by estimating how much of the variabihty of the mean scores can 
be explained as being due to other causes. If it appears at all 
likely that the variability is due to other causes, the interpreta- 
tion that there are differences m mean achievement or in school 
efficiency is not justifiable. If it appears that a large proportion 
of the variabihty is due to other causes, then it must be con- 
cluded that the differences in mean achievement or in school 
efficiency are much less than the findings indicate. If the 
magnitude of the contributions from other causes is uncertain, 
then the interpretation must be considered uncertam. 

If the interpretation is generahzed, that is, if the calculated 
statistics are considered typical of the schools over a period of 
years, the non-representativeness of some of the groups of pupils 
tested is a possible cause of the variability of the means. If 
careful inquiry reveals no information to the contrary, it may be 
assumed that the twelfth-grade pupils of a given year constitute 

1 Lindquist, E F “Factors Determining Reliability of Test Norms,” J ournal 
of Educational Psychology^ 21 : 512-20, October, 1930. 



252 STUDY OF EDUCATIONAL PEOBLEMS 

a random sample of the twelfth-grade pupils of a school over a 
period of years. On the basis of this assumption, a portion of 
the variabihty of the mean scores may be explained as being 
due to the effect of chance. In his study, Lindquist estimates 
that chance might make the variability of the mean scores about 
one-fifth as great as that of the individual scores. In the case 
of some of the pairs of schools, chance is likely to be responsible 
for a large portion of the observed difference. 

If the range of the mean scores is relatively small, it may be 
desired to ascertain whether their variabihty may be explained 
as probably due to chance alone. One method is to calculate 
the correlation ratio which may be obtained from the following 
relationship.^ 

2 _ m^(M^ - MiY 
- Mty 

The numerator of the fraction is the weighted sum of the 
squares of the deviations of the means of the several groups 
from the mean of the total distribution {M^ and the denomina- 
tor IS Nt times the square of the standard deviation of the total 
(Ntorl). After rj has been calculated, it may be tested for statisti- 
cal significance.^ If the correlation ratio is not significant, it may 
be concluded that the variability of the means is explainable 
as possibly due to chance. Fisher ® has developed a somewhat 

1 Tills formula can be derived from that given on page 98 See also page 88 

2 For an illustration of this method, see Reitz, Wilhelm “Statistical Tech- 
niques for the Study of Institutional Differences,” Journal of Experimental 
Education^ 3 11-24, September, 1934. 

3 Fisher, R A Stalidi(xd Methods for Research Workers London Oliver and 
Boyd, 1928, Chapter VII. 

See also Snedecor, G W Calculation and Interpretation of Variance and 
Comnance. Ames, Iowa Collegiate Press, Inc , 1934, Part I. 

For applications of Fisher’s techmque, see 

Reitz, Wilhelm, op cit 

Lyon, V E “The Variation of High School Semor and College Freshman 
Classes,” Journal of Experimental Education, 3 25-35, September, 1934 

Lyon gives mean intelligence percentile scores for 108 high schools for five 
successive years The means for a given school vary from year to year. In some 
cases, the range is relatively large. Chance is doubtless the principal cause of 
this variation, but it may be contributed to by systematic influences. The reader 
who consults these references should note that neither of the authors appears to 
recogmze the possibility that the variability of the means of a group of schools 
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different technique It should be noted that demonstration 
of the statistical significance of rj does not prove the depend- 
ability of the interpretation. When there is reason to ex- 
pect that the means are subject to systematic errors of either 
measurement or vahdity or both, a crude estimate of the prob- 
able effect of chance, such as Lindquist made, will be a more 
helpful procedure. 

In a cooperative testing program there are likely to be some 
variations in testing conditions which will contribute to the 
vanabihty of the mean scores. Even when there is a con- 
scientious attempt to follow the letter of the instructions, there 
may be accidental variations and differences in the attitude of 
the examiners which will result in systematic errors of measure- 
ment. Furthermore, since the measurement of achievement is 
indirect, errors of validity are possible and variations in the 
curriculum and general plan of instruction ^ are likely to intro- 
duce systematic errors of validity when the scores are labeled 
measures of achievement. If the interpretation is m terms of 
relative efficiency of the schools, the total effect of the systematic 
errors of measurement and the systematic errors of validity may 
be sufficient enough to explain many of the larger differences 
between mean scores. Hence, a large proportion of the variabil- 
ity of the mean scores which may appear as very astonishing to 
an uninformed person is probably due to systematic errors of 
measurement, systematic errors of validity, and non-representa- 
tiveness. However, the uncertainty in regard to the magnitude 
and direction of the contributions from these sources makes the 
interpretation of the findings of such a survey very hazardous 
It may be that if corrections could be made for the effect of non- 
representativeness, systematic errors of measurement, and sys- 
tematic errors of vahdity, a school occupying a relatively low 

for a single test may be contributed to by errors that are systematic within 
schools 

1 The objectives towards which instruction is directed may vary not only 
from school to school but also within the same school from year to year 

Remmers, H. H. “The Stability of Relative Excellence of High Schools/' 
School and Society, 38* 412-16, September 23, 1933. 
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position in the distribution of mean scores would be shown to 
belong near the top of the distribution.^ 

As another illustration, consider a survey of the duties per- 
formed by high school principals. A request to keep a diary of 
their duties for a period of two weeks is sent to two hundred high 
school principals selected at random from a hst of the accredited 
schools within a certain area. Apparently complete diaries are 
received from fifty-seven principals. From this information, 
there is compiled a list of the duties reported and the average 
number of minutes per week devoted to them by the fifty-seven 
principals. In mterpreting these findings as ^^distribution of 
time of typical high school principaP^ or as 'Tndices of the im- 
portance of duties performed by high school principals, several 
possibilities of error must be considered. It is not hkely that the 
records kept by the principals are subject to a systematic error 
in the case of the more oveit duties The records, however, may 
be m error in the case of duties that involve considerable deliber- 
ation such as the planmng of school policies. Unless the terms 
employed to designate the duties are carefully defined, it is 
likely that there may be some misunderstanding in regard to 
what the designated duty mcluded. Furthermore, the duties 
performed during the two weeks may not be representative of 
the school year and the group of principals reporting data may 
not be typical. Non-representativeness in either case will affect 
the dependabihty of generahzations. Hence, the dependabihty 
of an interpretation of the findings as “distribution of time of a 
typical high school principaU^ should be regarded as uncertain. 
An interpretation of the average number of minutes per week 
as indices of the relative importance of the duties is likely to be 
less dependable. 

A determination of the consensus of opinion relative to certain 
practices or issues is frequently attempted by employing a ques- 

^ The uncertainty of the dependability of the interpretation of the findings in 
statewide or regional surveys of school achievement constitute a strong argument 
against such surveys There are, however, other considerations See Douglass, 
H R ‘*The Effects of State and National Testing on the Secondary School/^ 
School Remew, 42 497-509, September, 1934 
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tionnaire consisting of groups of statements each of which ex- 
presses a number of opimons relative to a given topic or ques- 
tion The recipient is asked to check within each group the 
statement that most nearly represents his opmion or belief rela- 
tive to the designated topic or question Suppose such a ques- 
tionnaire IS mailed to one hundred '^representative’^ educators 
and that usable replies are received from forty. In considering 
the dependability of the statement checked most frequently as 
the consensus of opinion of "representative” educators, atten- 
tion should be given to the representativeness of the population 
responding to the questionnaire and to the accuracy of the 
statements checked as expressing the opinions of the several 
correspondents. 

The fact that a person occupies a position of prominence or 
has attained a reputation for good judgment does not make him 
competent with reference to aU questions and issues It is a 
frequent observation that a person whose acquaintance wuth a 
field IS casual is more willing to express an opinion and is more 
certam in his beliefs than an mtensive student of the field. 
Furthermore, if several persons, regarded as authorities in a 
given field, are asked for an opmion with reference to a particular 
question or issue, some of them will probably give the matter 
careful consideration while others will answer in a casual man- 
ner. In such a case, the authorities responding should not be 
regarded as equally competent. In other words, the competence 
of a person depends in part upon the manner in which he re- 
sponds, A person should not be regarded as competent merely 
on the basis of his general reputation or his reputation in a field 
not identified with the particular question or issue being studied. 
Hence, it is likely that the forty persons replying to the request 
are not representative of "competent” educators in general. 

When the most frequently checked statements are designated 
as the most correct beliefs or opinions, there is the implication 
that the judgments m one direction from the truth balance 
those in the opposite direction. This is likely to be the case when 
a physical magmtude such as the distance between two pomts is 
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estimated by a number of competent persons and the average of 
the estimates is likely to approximate the true magnitude This 
balance of deviations from the truth may not prevail when 
opinions or beliefs are expressed relative to educational ques- 
tions, especially when controversial matters are involved Tradi- 
tion, prevailmg public opimon, or other influences may cause the 
expressed opimons to be biased, even when statements have 
been secured from persons regarded as competent. For example, 
expressions concerning the value of Latin as a secondary school 
subject secured from teachers of Latin, especially those occupy- 
ing positions of prominence, will be biased. A consensus of 
opinion relative to the desirability of German as a secondary 
school subject obtained during the academic year of 1917-1918 
would not have been dependable. In other words, opinions may 
involve a systematic error and when this occurs the consensus of 
the opimons expressed should not be regarded as the truth or the 
wisest belief.^ 

If a senes of surveys are being utilized as a basis for studying 
trends, the apparent changes may be misleading. For example, 
consider a study in which a test is administered to pupils in 
grades six to twelve inclusive as a means of ascertammg the 
growth in language ability In interpreting the differences be- 
tween the mean scores of the successive grades as indices of the 
growth in language ability, it is necessary to consider the 
probable contributions from other sources. Chance may make 
some of the differences larger, but these will likely be balanced 
by ones made smaller. Hence, chance will not be likely to affect 
the general trend. Neither will the general trend be likely to be 
affected by systematic errors of measurement. Due to the cur- 
nculum, the general plan of instruction, and the increasing 
maturity of the pupils, the ratio of what the test measures di- 
rectly to language ability in grade six may not be the same as the 
corresponding ratio for the years of the senior high school. 
Hence, the trend defined by the mean scores for the successive 

^For a discussion of consensus of opinion in curriculum construction, see 
Chapter XII. 
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grades may not correctly represent the growth in achievement 
There may be additional error due to the selection of the suc- 
cessive grade groups. If the trend m the enrollment of a school 
is being studied, attention must be given to the comparability of 
the data for the successive years. 

Appraisal of survey findings in terms specifying degrees of 
value. Comparison of survey findings from two or more popula- 
tions or units, or comparison with norms may be thought of as 
appraisal m terms of more or less. Frequently an appraisal in 
terms of value is desired. Studies reported by Fowlkes ^ and 
Heck 2 afford illustrations of this type of appraisal. In both 
researches, interpretations of the data collected are made in 
terms of what practices are to be held superior. 

Appraisal of this type requires a criterion, i e., specification of 
what is desirable or what should be. Such critena cannot be 
determined objectively. An investigator may survey and de- 
termine the “average"' present practice or ascertain a consensus 
of opmion of authorities, but somewhere in the process he must 
decide what the criterion is to be. The report of Thomas ^ on 
public school plumbing equipment is an excellent illustration of 
the derivation of criteria. Thomas studied the literature deal- 
ing with plumbing equipment for schools, interviewed experts, 
and observed the plumbmg in schools, office buildings, hotels, 
and pubhc comfort stations With the information from these 
sources at hand, he formulated the criteria which he used m the 
evaluation of present practices in public school plumbing equip- 
ment and in making recommendations relative to the plumbing 
eqmpment of new school buildings. 

Accounting systems and plumbing equipment are tangible 


1 Fowlkes, J G ‘‘The Accounting of Public School Expenditures m Wiscon- 
sin,” University of Wisconsin^ Bureau of Educaiwnal Research Bulletin, No 4. 
Madison University of Wisconsin, 1924 59 pp 

® Heck, AO ” A Study of Child-Accounting Records,” The Ohio State Unt^ 
versUy Studies, Vol 2, No 9, Bureau of Educational Research Monographs, 
No. 2. Columbus Ohio State University Press, 1925 245 pp. 

» Thomas, M W “Public School Plumbing Equipment,” Teachers College, 
Columbia University Contributions to Education, No. 282 New York* Bureau 
of Publications, Teachers CoUege, Columbia Umversity, 1928. 128 pp- 
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and the criteria arrived at by a discerning and critical student 
of present practices are hkely to be acceptable to most persons. 
Acceptable criteria for appraising pupil achievement are more 
difficult to determine. Lists of educational objectives are avail- 
able, but authorities do not agree, especially when the objectives 
are sufficiently detailed to serve as a basis for evaluating pupil 
achievement. Test norms indicate the achievement that char- 
acterizes the average or typical pupil, but they fail to indicate 
whether or not greater achievement is to be valued. High 
achievement in handwriting may be decreasingly desirable in 
a civilization which values less and less the abihty to wnte 
attractively as well as legibly. High achievement in Greek 
or Latin may be decreasmgly desirable in a civilization 
which places a decreasing value on a knowledge of these lan- 
guages. 

It is somewhat futile to attempt to estimate the possible 
dependability of appraisals of present practices and conditions 
in terms of value or desirability. They may be regarded as 
sound” or “justified” to the extent that it is evident that they 
are made with due consideration of all relevant factors, A 
further test of the justification of the appraisals of this type 
is the extent to which they are instrumental in the improvement 
of practices or conditions, but in many cases it would be difficult 
to determine what constitutes improvement. The possibihties 
and limitations of methods that may be employed in the solu- 
tion of problems implying a determination of “what should 
be” are considered at greater length in Chapter XII. 

The identification of relationships. The study of trends 
referred to on page 215 may be thought of as an attempt to 
determine the relation between time and the status being 
studied. Other types of survey data are frequently studied to 
determine relationships. As pointed out on page 215, such 
studies are beyond the scope of this chapter when correlation 
analysis is employed, but it may be noted that in several cases 
the interpretation of comparative survey findings implies the 
recognition of a relationship and an explanation of a difference 
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in status is essentially an attempt to identify one or more 
causal relationships. 

ILLUSTRATIVE BIBLIOGRAPHY OF SURVEY INVESTIGATIONS 

This illustrative bibliography is supplementary to surveys mentioned m 
the preceding pages In compiling it the authors have endeavored to include 
illustrations of a variety of techniques for collecting data rather than to 
present a hst of model surveys A number of the studies are subject to 
criticism, m some cases rather serious ones The annotations are intended to 
indicate the general procedure of the survey No attempt is made to de- 
scribe the findings or to evaluate them. A number of survey studies relating 
to the curriculum are given in the bibhography of Chapter XII 

1 Anderson, E. M. ‘"Individual Differences in the Reading Ability of 

College Students,” Umversity of Missouri Bulletin, Vol 29, No. 39, 
Education Series No 25. Columbia, Missouri. University of Mis- 
souri, 1928. 77 pp. 

Data were collected by administration of tests The representativeness of 
the 237 college students is considered. 

2 Andrus, Ruth “A Tentative Inventory of the Habits of Children 

from Two to Four Years of Age,” Teachers College, Columbia Univer^ 
sity Contributions to Education, No 160 New York Bureau of 
Pubhcations, Teachers College, Columbia Umveisity, 1924 50 pp. 

Data were secured by systematic observation of nursery school children 
and through examination of unpublished data and descriptions of observed 
behavior recorded m over one hundred books and articles. 

3 Baird, DO. “A Study of Biology Notebook Work in New York 

State,” Teachers College, Columbia University Contributions to Educa- 
tion, No 400 New York: Bureau of Pubhcations, Teachers College, 
Columbia University, 1929. 118 pp 

Fifty-two representative biology notebooks were analyzed with respect 
to nature and number of experiments reported, kind and amount of draw- 
mg, vocabulary used, quality of handwriting, and certain other items. A 
basis of appraisal was effected by securmg data from several sources 

4 Barr, A. S Characteristic Differences in the Teaching Performance of 

Good and Poor Teachers of the Social Studies Bloomington, Illinois: 
Pubhc School Pubhshmg Company, 1929 127 pp 

Data were collected by general observations, attention chart, tune chart, 
stenographic report, and by letters from superintendents and teachers 

5. Barr, A S , and Gifford, C. W. “The Vocabulary of American 
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History,” Journal of Educational Research^ 20. 103-21, September, 
1929 

Eight representative American history texts were subjected to vocabulary 
analysis, without samphng Certam words were excluded on the basis of 
ten criteria, one of which was the first three thousand words of the Thorn- 
dike Wordbook list. 

6 Baetlett, L W. “State Control of Private Incorporated Institutions 
of Higher Education,” Teachers College^ Columbia University Con- 
tributions to Education, No 207. New York Bureau of Pubhcations, 
Teachers College, Columbia University, 1926. 95 pp. 

The sources ot data consulted m this study include decisions of the 
United States Supreme Court, laws of states govermng the incorporations 
of institutions of higher education, and charters of selected private colleges 
and umversities. 

7. Bendek, J F “The Functions of Courts m Enforcmg School Attend- 

ance Laws,” Teachers College, Columbia University Contributions 
to Education, No 262 New York Bureau of Pubhcations, Teachers 
College, Columbia University, 1927 187 pp 
Souices of data include reports of state and city superintendents, state 
school surveys, court records, and state school laws The author attended 
hearings of more than two hundred cases mvolvmg violation of attendance 
laws 

8. Bennett, HE “A Study of School Posture and Seating,” Elementary 

School Journal, 26* 50-57, September, 1925 
Facts and opinions concerning seating were collected from the literature 
on school hygiene Medical literature was consulted with respect to infor- 
mation on the relations of sedentary postures and physical defects. Exten- 
sive and intensive observational studies were made of school children m 
which five thousand individual posture records were obtained Measure- 
ments of more than 2500 children in all proportions pertinent to seatmg 
and desk dimensions were made by means of an elaborate apparatus de- 
vised for the purpose 

9. Blankenship, A. S. “The Accessibility of Rural Schoolhouses m 

Texas,” Teachers College, Columbia University Contributions to 
Education, No 229 New York. Bureau of Publications, Teachers 
College, Columbia Umversity, 1926. 62 pp. 

The investigator compiled maps from school census data and data in the 
offices of county superintendents showmg locations of rural school sites and 
rural dwelhngs, A questionnaire was sent to each of the schools of Texas 
employing transportation Facts pertaimng to area, length of transporta- 
tion routes, and number of teachers employed were requested. 
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10. Book, W F , and Habter, R. S “Mistakes Which Pupiis Make m 

Spelling,” Journal of Educational Researchy 19; 106-18, February, 

1929 

The sources of data used m this investigation were 3096 spelling test 
papers of pupils in the second to eighth grades, 608 compositions of first 
and second year high school pupils, and 1492 themes of college freshmen. 

11. Buckner, M A. “A Study of Pupil Elimmation m the New Haven 

High School,” School Remew, 39 532-41, September, 1931. 

In addition to obtaining data from school records, the investigator em- 
ployed the interview technique with respect to a sample of 196 pupils that 
had dropped out of school. 

12 Caliver, Ambrose. “A Personnel Study of Negro College Students,” 

Teachers College, Columlna University Contributions to Education, 
No 484. New York: Bureau of Pubhcations, Teachers College, 
Columbia University, 1931. 146 pp 

Data were collected by means of questioimaires, by examination of 
records, and by administration of tests. 

13 Chadsey, C. E , et al “The Status of the Superintendent,” First 

Yearbook of the Department of Superintendence, Washmgton. National 
Education Association, 1923. 206 pp. For description see page 2. 

14 Chapman, J C , and Eby, H. L. “A Comparative Study, by Educa- 

tional Measurements, of One-Eoom Rural-School Children and 
City-School Children,” Journal of Educational Research, 2 636-46, 
October, 1920. 

In the interpretation of the data, mterestmg use is made of the Pearson 

Coefficient of Variation — — ’ formula ordmarily used, Q is re- 

M 

placed by cr ) 

15 Christoeperson, H. C. “College Freshmen and Problem Solving in 

Arithmetic,” Journal of Educational Research, 21: 15-20, January, 

1930 

A standardized anthmetie test was administered to a group of 99 college 
freshmen in this study The mvestigator reports an analysis of the errors 
made on the test and also a comparison of the 25th percentile, median, and 
75th percentile of those students with the corresponding eighth-grade 
norms 

16 Counts, G. S. “The Social Composition of Boards of Education,” 

Supplementary Educational Monographs, No. 33, Chicago* Univer- 
sity of Chicago Press, 1927, 100 pp. 
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Questionnaires were sent to superintendents to secure factual data rela- 
tive to their boards of education and returns were secured from 1654 su- 
permtendents widely distributed about the country. 

17 Elder, Vera, and Carpenter, H S ‘^Beading Interests of High 

School Children,^’ Journal of Educational Research, 19 276-82, 
April, 1929 

A questionnaiie was used m this study to secure data relative to the 
reading interests of 487 girls on all grade levels of one city high school 
These data were supplemented by information obtamed from reports of 
outside reading done by these students. 

18 Elsbree, W S “Teacher Turnover m the Cities and Villages of 

New York State,” Teachers College, Columbia University Contribu- 
tions to Education, No 300. New York Bureau of Pubhcations, 
Teachers College, Columbia Umversity, 1928 88 pp 
Data were collected by means of a repoit blank filled out by 71 of the 
83 superintendents in New York State, exclusive of New York City and 
Buffalo Additional data were secured from records m the Teachers Retire- 
ment Bureau and State Department of Education. 

19 Fitch, H N “An Analysis of the Supervisory Activities and Tech- 

niques of the Elementary School Trammg Supervisor in State Normal 
Schools and Teachers Colleges,” Teachers College, Columbia Univer- 
sity Contributions to Education, No 476 New York Bureau of 
Pubhcations, Teachers College, Columbia University, 1931 130 pp 
A cheek list of supervisory activities was prepared by exanunation of 
student teaching manuals and of previous research reported in the field 
The hst of 422 items was distributed to 779 supervisors Usable returns 
were received from 355 supervisors 

20 Flowers, I V “The Duties of the Elementary-School Principal,” 

Elementary School Journal, 27' 414-22, February, 1927 
The investigator sent a request to 170 principals of elementary schools to 
keep diaries of their daily work for a period of two weeks Replies were 
received from 67 prmcipals. 

21 Foster, J C “Distribution of the Teachers’ Time among Children 

m the Nursery School and Eandergarten,” Journal of Educational 
Research, 22 172-83, October, 1930 

The data were collected by observation in which the record blank in- 
cluded the following items Name of child, activity of child, whether child 
came to teacher of his own accord, and the number of seconds spent by the 
teacher on the child 

22 Henry, N B “A Study of Public School Costs in lUmois Cities,” The 
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Educational Finance Inquiry, Vol 12 New York The Macmiilaa 
Company, 1924 82 pp 

Data were collected by exaimnation of the records of twelve Illmois cities 
ranging m size from eleven to seventy-six thousand Supplementary in- 
formation was seemed through mterviews The study is mterestmg with 
respect to care used in reduemg data to comparable bases 

23. Hewitt, Alden ''A Comparable Study of White and Colored Pupils 

m a Southern School System, Elementary School Journal, 31.11 1-19, 
October, 1930 

The lUmois Exammation, with the exception of its arithmetic test, was 
admmistered to ninety colored and to eighty-five white, seventh-grade 
pupils. 

24. Inman, J H ''The Trammg of Iowa High School Teachers in Relation 

to the Subjects They Teach, University of Iowa Studies in Educa- 
tion, Vol 4, No. 9. Iowa City, lovra Umversity of Iowa, 1928 

66 pp 

Questionnaires were sent to 2000 giaduates of eleven colleges in Iowa 
who had had from one to five years of high school teachmg experience 
Usable returns were leceived from 1048 Additional data were secured 
from the lecords of the several colleges 

25 Kelly, E L , and Whitney, F. L. “Educational Magazmes Read by 
Five Hundred Elementary School Prmcipals and Classroom Teach- 
ers,” Elementary School Journal, 29 176-80, November, 1928 
Check lists of educational magazines were sent to elementary prmcipals 
and teachers, members of the Department of Elementary School Principals, 
The respondents were located m 156 cities of all sizes m forty states and the 
District of Columbia 

26. K!elly, F. J. The American Arts College, A Limited Survey. New York 

The Macmillan Company, 1925 198 pp 
Data were collected by visits to four state umversities, three endowed 
universities, five endowed colleges, and one city university. Brief visits 
were made to four other mstitutions On these visits conferences w^ere held 
with presidents, deans of the arts colleges, representative members of the 
faculties, and with student leaders Additional data were secured from 
alumm by means of a questionnaire 

27. Klein, A J , et al “Survey of Land-Grant Colleges and Umver- 

sities,” United States Office of Education Bulletin, 1930, No 9 Wash- 
ington* Government Printing Office, 1930 Vol. I, 998 pp. Vol II, 
921 pp 

Questionnaires prepared by specialists with the aid of advisory com- 
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mittees were mailed to the faculties and former students of land-grant 
institutions The magnitude of this survey investigation is mdicated by 
the fact that 12,032 staff members and 37,342 former students filled out 
questionnaires The care used in tabulating and organizmg the data is 
mdicated by the fact that ^‘When errors, discrepancies, and omissions were 
discovered by the specialists the pages in question were returned to 
the institution with a request for correction or explanation ” 

28 Longshore, W T , et al ^^The Elementary School Prmcipalship,’^ 

The Seventh Yearbook of the Department of Elementary School Pnna^ 
pals Washington National Education Association, 1928, pp 132- 
638. 

Data used in this study were secured from previous research and from the 
administration of several questionnaires Case studies were made of a 
number of outstandmg principals 

29 Lundeen, G E , and Caldwell, 0 W Study of Unfounded 

Beliefs among High School Semors,” Journal of Educational Researchf 
22 257-73, November, 1930. 

A list of 200 unfounded beliefs of wide geographical distribution was 
compiled in the form of a questionnaire to which students could respond 
with information respectmg whether or not they had heard, believed, or 
were influenced by the beliefs. One thousand thirty questionnaires were 
filled out by high school seniors m ten high schools. Similar data were 
secured for purposes of comparison from 294 college students m three 
colleges, 

30 McGaughy, J. R. ‘‘The Fiscal Administration of City School Sys- 

tems,*’ The Educational Finance Inquiry, Vol 5. New York: The 
Macmillan Company, 1924 95 pp. 

Data were collected with respect to 377 widely distributed cities from 
documents of the United States Office of Education, of the National Com- 
mittee for Chamber of Commerce Cooperation with the Public Schools, 
and from the Commercial and Financial Chronicle* 

31. McIntosh, H, W., and ScHRAiyiMBL, HE “A Comparison of the 
Achievement of Eighth-Grade Pupils in Rural Schools and m Graded 
Schools,” Elementary School Journal, 31:301-06, December, 1930 

Distributions of scores of 1921 pupils in graded schools and 1611 pupils 
in rural schools are presented in this study Comparisons are made between 
the measures of central tendency and variability of these two groups of 
pupils. 

32 Maxwell, C. R, et al. “A Report on College Freshmen for the 
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First Semester of 1928-29,” North Central Association Quarterly, 
4 484-600, March, 1930. 

A questionnaire was sent to the secondary schools accredited by the 
association requestmg the names of graduates and the colleges m which 
they enrolled. A questionnaire was then sent to the colleges with the names 
of reported entrants hsted This questionnaire requested information with 
respect to each entrant's success or failure in several college subjects. 

83. Melchior, W T '' Insurmg Public School Property,” Tmc/iers 

Columbia University Contnhutions to Education, No 168 New York: 
Bureau of Publications, Columbia University, 1925 187 pp 
A complicated and lengthy questionnaire was sent to every school dis- 
trict of New York State under the sponsorship of the State Department of 
Education. A 57 per cent response was secured and questionable data were 
verified by further mquiry. 

34. Monroe, W S. ^‘A Survey of the Pequirements for the Doctor of 
Philosophy m Education,” School and Society, 31:655-61, May 17, 
1930 

Data ’were secured by questionnaire from each of the 29 mstitutions 
grantmg the ioctoi's degree in education more than once m the decade 
1920-30 Fmdmgs were validated by havmg the respondents read critically 
a prehnunary report of the study 

35 Nelson, M G ‘^Subject Combmations m the Programs of Teachers in 

Small Secondary Schools in New York State,” School Remew, 37. 426- 
32, June, 1929. 

Data were secured from teachmg schedules obtained from 210 of the 
350 small secondary schools to which requests were sent. 

36 Odell, C. W. '^The Progress and Ehmmation of School Children in 

Illmois,” University of lUirms BvUetin, Vol. 21, No. 38, Bureau of 
Educational Research Bulletin, No. 19. Urbana: Umversity of 
Illmois, 1924. 76 pp 

The progress record blanks used in collectmg data included such items as 
grade entered, grade at present, number of times failed of promotion, and 
number of times skipped Usable records were secured with* respect to 
53,000 urban elementary pupils, 5500 rural elementary pupils, and 8500 high 
school pupils. 

37. Ojemakn, R. H. 'The Constant and Variable Occupations of the 
Umted States in 1920,” University of Illinois Bulletin, Vol. 24, No. 39, 
Bureau of Educational Research Bulletin, No. 35- Urbana: Univer- 
sity of Illinois, 1927 47 pp 

Data were collected from Umted States Census Reports of 1920 The 
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jfindmgs ere compared with those reported m 1914 by Ayres who used the 
1900 Census Reports 

38 Price, R R. “The Fmancial Support of State Universities,” Harvard 

Studies in Education, Vol 6 Cambridge Harvard University Press, 
1924 205 pp 

Data were secured from reports and compilations of Fedeial and State 
legislation, bulletins of the U S Bureau of Education and other Federal 
departments, annual or biennial lepoits of state officers, such as auditors, 
tax leports, such as those of boards of regents, presidents, and finance 
offices, also annual catalogs; histones of the institutions, surveys of the 
institutions, general books of reference or textbooks containing compila- 
tions of pertinent data, published proceedings and transactions of associa- 
tions of universities and colleges The investigator presents a critical re- 
view of the sources used with particular reference to their reliability 

39 Remmers, H H , and Grant, A “The Vocabulary Load of Certam 

Secondaiy School Mathematics Textbooks,” Journal of Educational 
Reseaich, 18 203-10, October, 1928 

Twelve mathematics texts vrere analyzed with respect to vocabulaiy by 
taking one line on each of a selected sample of pages until a total of 1000 
running w’ords was included Thorndike’s list of 10,000 most commonly 
used "woids was taken as a cnteiion for identifying the technical vocabulaiy, 
A second count on a similar basis showed differences that may not be con- 
sideied statistically significant 

40. RtJFi, John “The Small High School,” Teachers College, Columbia 

University Contributions to Education, No 236 New York Bureau 
of Publications, Teachers College, Columbia Umversity, 1926. 
145 pp 

Five small high schools were studied in detail Records were exanoined, 
tests were administeied, classroom teaching observed, teachers interviewed, 
and library and laboratory facihties inspected In addition, the commu- 
mties in which the schools were located were studied. 

41. Savage, H J , et al. “American College Athletics,” The Carnegie 

Foundation for the Advancement of Teaching, Bulletin No 23 New 
York The Carnegie Foundation for the Advancement of Teachmg, 
1929. 383 pp 

The data collected m this inquiry were obtained by means of mterviews. 
Five members of the mquiry’s staff visited 130 institutions and consulted 
himdreds of students, teachers, alumni, and other persons In some cases 
two or three visits were paid to an mstitution 

42. ScHWEGLER, R A , and Winn, Edith. “A Comparative Study of the 
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InteEigence of White and Colored Children,” Journal of Educational 
Research^ 2 838-48, December, 1920. 

In this study the Stanford-Bmet was administered to 34 colored girls and 
24 colored boys and to equal numbers of white children drawn at random 
from the same school. 

43. Selke, Erich. ''A Comparative Study of the Vocabularies of Twelve 

Begmmng Books m Readmg,” Journal of Educational Research, 
22 369-74, December, 1930. 

Beginning books of 12 senes of readers were subjected to vocabulary 
analysis in this study ^^Each word m each book was hsted and its fre- 
quency determined 

44. Smith, H L ‘‘A Survey of a Pubhc School System,” Teachers College, 

Columbia University Contributions to Education, No 82 New York. 
Bureau of Pubhcations, Teachers College, Columbia University, 
1917 304 pp. 

This study constitutes a survey of the public schools of Bloomington, 
Indiana, m which data were collected durmg the yeais 1912-1915 More 
attention is given to measurement of pupil achievement than is characteris- 
tic of most surveys of this period Several standardized tests were adminis- 
tered 

45. Stuart, Hugh ''^The Training of Modem Foreign Language Teachers 

m the United States,” Teachers College, Columbia University Con- 
tributions to Education, No 256 New York Bureau of Pubhcations, 
Teachers College, Columbia Umversity, 1927 111 pp. 

A questionnaire was mailed to 776 universities, liberal arts colleges, and 
teachers colleges and replies were received from 412 A second question- 
naire on observation and practice teachmg was mailed to 405 mstitutions 
and replies were received from 228. 

46 Sturtevant, S. M., and Strang, Ruth *^A Personnel Study of Deans 
of Girls in High Schools,” Teachers College, Columbia University 
Contributions to Education, No 393. New York Bureau of Publica- 
tions, Teachers College, Columbia University, 1929 150 pp 
Questionnaire, observation, and interview 'were used m coUectmg the 
data of this study. '^Visits were made to five schools, m which the dean 
permitted the observer to sit m her ofiBce for a day, observmg and recording 
her activities, and mterviewmg her when she was not busy with other 
people,” 

47. Terman, L M , et al “Mental and Physical Traits of a Thousand 
Gifted Children,” Genetic Studies of Genius, Vol I. Stanford, Cahfor- 
nia Stanford Umversity Press, 1925. 648 pp. 
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A group of gifted children were carefully selected on the basis of in- 
telligence tests as the final criterion This group was investigated with 
respect to many characteristics including racial and social origin; in- 
tellectually superior relatives, health and physical history; school progress 
and intellectual history; play interests; reading interests; intellectual, social, 
and activity interests, character and personality traits. Comparisons are 
made with a group of average children. The report mcludes a re-survey of 
the gifted children two years after the collection of the initial data A more 
recent volume reports the status of the group of children several years 
later, see 

Terman, L M , et al ‘^The Promise of Youth (Follow-up Studies 
of a Thousand Gifted Children ) ” Genehc Studies of Genius, Vol III. 
Stanford, Cahforma Stanford University Press, 1930 508 pp 

48 Thorndike, E L , and Bregman, E O. On the Form of Distribution 

of Intellect in the Ninth Grade,” Journal of Educational Research, 
10 271-78, November, 1924. 

The distribution of the mtelhgence test scores of over 14,000 nmth-grade 
pupils IS given and is shown to be approximately normal by means of 
Pearson’s test for goodness of fit 

49 Thorndike, E L , and Robinson, Eleanor “The Diversity of High 

School Students’ Programs,” Teachers College Record, 24 111-21, 
March, 1923 

Tenth-grade pupils in ten school systems indicated in wilting what 
subjects they were taking during the school year 

50 Webb, P E “A Study of Geometric Abilities among Boys and Girls of 

Equal Mental Abilities,” Journal of Educational Research, 15 256-62, 
April, 1927. 

Data were collected in this study by the administration under carefully 
controlled conditions of a standardized geometry test to 624 boys and 
506 girls m four Cahforma high schools. The mental ages of these students 
as given by administration of the Teiman Group Test of Mental Ability 
were secured from school records All of the geometry test papers were 
objectively scored and were rechecked. 

51. Wheeler, L. R. “A Comparative Study of the Physical Growth of 
Dull Children,” Journal of Educational Research, 20. 273-82, Novem- 
ber, 1929 

52 Whipple, G. M Sex Differences in InteUigence-Test Scores in the 
Elementary School,” Journal of Educational Research, 15 111-17, 
February, 1927 

Intelligence test scores from the administration of the National In- 
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telbgence Tests, Scale A were secured from 2198 elementary school pupils 
Similar data were secured by the administration of the Illinois General 
Intelligence Scale to 2501 pupils 

53 Whitney, F L. ‘‘Teacher Demand and Supply in the Pubhc Schools,'' 

Colorado Teachers College Education Series^ No 8 Greeley, Colorado: 
Colorado State Teachers College, 1930 139 pp 
Data ^ ere collected by examination of records, reports, and by means of a 
questionnaire sent to superintendents 

54 Woodring, M N “A Study of the Quahty of English in Latin Transla- 

tions," Teachers College^ Columbia University Contnbutwns to Educa^ 
tion^ No 187 New York Bureau of Pubhcations, Teachers College, 
Columbia University, 1925 84 pp 

One himdred and fifty Latin examination books and the same number of 
English examination books from the College Entrance Examination Board 
were analyzed in this study The Hudelson Scale was used in measuring 
the quality of the translations and the quality of composition m the English 
examination books 

55 Woody, Clifford “The Arithmetical Backgrounds of Young Chil- 

dren," Journal of Educational Psychology y 24 188-201, October, 1931. 
An inventory test in arithmetic was administered to approximately 
3000 kindergarten, first-grade, and second-grade pupils in 39 different school 
systems widely scattered throughout the United States. 



CHAPTER IX 


STUDYING THE EFFECT OF A SPECIFIED CHANGE 
IN A GIVEN CAUSE 

General character of the problems. A typical problem is one 
in which a question is asked concerning the effect upon average 
pupil achievement of changing from one method of mstruction 
to another. The specified change, however, may be in any factor 
that affects pupil achievement such as the length of class period, 
size of class, or textbook used. Other problems call for deter- 
mining the effect of a specified change upon other traits or 
conditions. For example, an investigator may seek the effect 
of a curriculum change upon the proportion of children com- 
pleting the twelfth grade. Sometimes the problem is stated in 
a form that does not make explicit the nature of the questions 
asked. 


1. What is the effect of summer school attendance on scholastic 
achievement'!’ 

2 What is the value of moving pictures as visual aids m instruction? 

3 What is the optunum size of class for the various school subjects? 

4. What are the most effective grade placements of curriculum 
matenals? 

5. In what grade should the study of arithmetic begin? 

6 To what extent does traimng in Latin transfer to other fields of 
study? 

In the first of these statements the specified change in attendance 
is from no summer school attendance to attendance In the 
second, value is to be interpreted as meaning effect, and 
the problem may be restated ‘‘What is the effect upon pupil 
achievement of introducing motion pictures as visual aids?’' 
The third problem may be thought of as calhng for the deter- 
mination of the relative effects of various changes in size of 

270 
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class and the identification of the size of class for which the 
achievement is a maximum The interpretation of the fourth, 
fifth, and sixth problems is similar. In the seventh the specified 
change is from no training in Latin to traming in the subject 
and the effect is achievement in some other field of study. 

Table X Average Spelling Achievement and Minutes per Day 
Devoted to Spelling, Seventh Grade 
After Rice 


Avee\ge Spelling Score 

Minutes per Diy Demoted to Spelling 

86 5 

60 

80 0 

40 

84 0 

30 

78 0 

30 

77 2 

30 

76 0 

30 

76 7 

25 

84 9 

20 

84 0 

20 

80 6 

20 

79 5 

2(X 

72 8 

20 

81 1 

15 

90 0 

12 

75 3 

10 


Comparative survey method as a means of determining the 
effect of a specified change in a given cause. The comparative 
survey method of studying the effect of a specified change m a 
given cause may be illustrated by Biceps ^ study in which he 
attempted to determine the effect upon spelling achievement of 
variations in the time allotment for teaching the subject He 
surveyed the spelling achievement of the pupils in a number of 
cities by administering the same test to all groups In the cities 
included m this survey the time allotment in the seventh grade 
varied from ten minutes per day to sixty minutes per day. The 
resulting average spelhng scores are given m Table X Although 
the lowest average spelling score m this table corresponds to 

^ Rice, J M SaienUfic Management %n Education New York Hinds, Noble 
and Eldredge, 1912 Chapters V and VI The report was iSrst pubhshed in The 
Forum, April and June, 1897 
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twenty minutes per day, the next lowest to ten minutes per 
day, and the next to the highest to sixty minutes per day, it is 
evident that the correspondence between “minutes per day^^ 
and “average score” is far from perfect. The average spelling 
scores for twenty minutes per day are on the whole shghtly 
greater than those for thirty minutes per day. Hence, one might 
conclude that twenty mmutes per day was the optimum length 
of period for spelling. Such a conclusion, however, would not 
have a very sound basis. An attempt to interpret the data of 
this table raises the question of the comparabihty of the several 
schools. Although information is not available, it is not un- 
hkely that the schools differed in several respects which might 
affect the average spelhng score of the pupils in the seventh 
grade. For example, it is hkely that more incidental attention 
was given to spellmg in those schools for which the time allot- 
ment was small than in those for which it was more generous. 
There may have been differences in the textbooks used The 
methods of teaching may have varied Possibly the quality of 
the pupil material may not have been the same in all cities. 
In view of the uncertainty in regard to the equivalence of such 
factors in the several schools, it is obvious that the findings 
cannot be considered dependable. 

Rice’s study illustrates the weakness of the comparative sur- 
vey method of determining the effect of a specified change m a 
cause. In order to overcome the difficulties noted, it is neces- 
sary to select populations that are equivalent with respect to 
pertinent conditions and to plan the instruction that the pupils 
receive. When this is done, the research is called a controlled 
experiment. 

An illustration of a controlled experiment^ In an attempt 
to ascertain the effect of systematic instruction in reading arith- 
metical problems upon problem solving ability, two groups of 
fifth-grade classes were selected in the public schools of Decatur, 


1 For a more extended accomit, see Monroe, W. S., and Engelliart, M D. 
^‘The Effectiveness of Systematic Instruction in Beading Verbal Problems in 
Aritbrnetic,’' The Elementary School Journal^ 33* 377-81, January, 1933 
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Illinois. In making the selection, classes were chosen so that 
the two groups would be approximately equivalent with refer- 
ence to both pupil material and teachers. More exact equiva- 
lence of pupil material was secured by administering an in- 
telhgence test and selecting pairs of pupils on the basis of IQ’s. 
The equivalence of the groups thus formed w^as checked with 
reference to initial abihty in both reading and arithmetic. As a 
means of checking upon the equivalence of the teachers of the 
two groups, each class was visited. No e\ddence w’-as noted 
which indicated significant differences in the teaching ability 
of the two groups. It also appeared that the directions given 
to the teachers of the experimental classes were being followed 
and that the teachers of the other classes were continuing their 
usual plan of instruction which involved no systematic training 
in the reading of arithmetical problems. 

In contrast with Eice’s investigation, this study is char- 
acterized by an attempt to secure eqmvalence of pupil material, 
teachers, and other factors that might be expected to affect 
solving problem achievement. In other words, the investigation 
qualifies as a controlled experiment. 

Definition of terms. Consideration of controlled experimenta- 
tion wdll be facilitated by introducing certain terms. A variable 
designates a changmg magnitude such as the total mileage 
reading of the speedometer on a motor car, a person’s bank 
balance from day to day, his chronological age, the mean 
achievement of a class from month to month. The concept of a 
variable may be apphed to a group of measures such as the 
scores of a class on a test, the salaries of a group of teachers, the 
size of classes within a school or group of schools, and the like. 
In such cases it is necessary to think of the measures as being 
arranged in a fixed order. Then the magmtude of the trait or 
characteristic will vary from measure to measure, A series of 
comparable textbooks or a series of comparable methods of 
teaching may also be thought of as a type of variable. In fact, 
any trait or characteristic in which a group exhibits individual 
differences may be thought of as a variable. 
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The concept of a variable implies continuity of change, which 
is illustrated by the total mileage readings of a speedometer. 
As the change is made from one readmg to another, all inter- 
mediate values appear. A contmuous variable may be measured 
to any degree of fineness that the instrument permits. This is 
true in the case of a personas height, or weight, or his chron- 
ological age. Size of a class must be measured in terms of m- 
tegers. Hence, this variable is not continuous. But for many 
purposes it is not essential that a distinction be made between 
variables that are continuous and those that are discontinuous. 

When the measures of a variable are in quantitative terms, 
their relative magnitude defines an order and the series is desig- 
nated as an ordered one. Ratings in terms of verbal categories 
such as very poor, poor, average, superior, and excellent also 
form an ordered series. Some series, however, are unordered A 
group of comparable textbooks illustrates an unordered series 
Teaching methods, types of school orgamzation, and occupa- 
tions are other illustrations of an unordered series. The fact 
that a series is unordered, however, does not interfere with 
thinking of it as defimng a variable. 

A variable that is thought of as being contributed to by 
other variables operating as causes is designated as dependent 
The causal variables are referred to as independent variables or 
as factors of the dependent variable. Hence, an experimental 
problem may be thought of as one of determining the effect 
upon the dependent variable of a specified change in a particular 
independent variable, commonly designated as the experimental 
factor. 


A. Peocedhre of Expeeimentation 

The general plan of experimental research. The general plan 
of experimental research may be illustrated by considenng a 
problem m which an investigator is seeking the effect upon 
pupil achievement of a specified change in some factor con- 
tributing to this dependent variable When a group of pupils 
is subjected to instruction, then average achievement increases 
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with the passage of time. In order to secure a measure of the 
effect upon this growth in achievement of the specified change 
in an independent variable, it is necessary to determine the 
status that would have been attamed if no change had been 
made. Hence, a controlled learmng experiment requires the 
use of a minimum of two groups of pupils, one for each status 
of the experimental factor. Sometimes reference is made to 
single group experiments, but all that can be accomplished in 
such cases is the determination of the status of the dependent 
variable resultmg from the functioning of the several independ- 
ent variables. If a standardized test is used to measure the 
achievement specified as the dependent variable, the group to 
which the test was administered in the process of standardiza- 
tion may, under certam conditions, be utihzed as the experi- 
mental group ^ 

In a typical experiment two groups of pupils are selected so 
that they are equivalent with respect to the achievement desig- 
nated as the dependent variable and with respect to all traits 
that may be expected to contribute to an increase in this 
achievement. Both groups are then subjected to the same 
instructional influences except that defined by the experimental 
factor. It IS customary to apply the label experimental group 
to the one for which the status of the experimental factor is 
least typical The other group is called the control. At the end 
of the experimental period, both groups are measured with 
respect to the dependent variable. The difference in gam or 
growth is the effect of the specified change in the experimental 
factor. 

The dependent and independent variables of experimental 
problems in the field of education. The dependent variable is 
frequently the average achievement ^ of a group of pupils. 

1 For an illnstration in which this was done, see Pratt, H G , Dunlap, J W , 
and Cureton, E. E. '‘The Subject-Matter Progress of Three Activity Schools in 
Hawau, with a Note on Statistical Technique,’* Journal of Educational Psychol- 
ogy , 20:494-99, October, 1929. The formula developed in this reference is incor- 
rect For the correct form, see Holzinger, K J “The Probable Error of a Dif- 
ference Formula,” Journal of Educational Psychology, 21 63-64, January, 1930 

2 Courtis has proposed that instead of taking pupil achievement as a depend- 
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This achievement may be limited to a narrow group of specific 
habits or it may include all outcomes of learning m a school 
subject or even in several subjects. Any effect, however, may 
be taken as the dependent variable. Variability of a group of 
pupils, average daily attendance, per cent of high school grad- 
uates entering college, eflSiciency of a school system, teachers^ 
salaries, amount of reading done by pupils outside of school, 
and the hke,^ may be thought of as dependent variables. 

Given a dependent variable, the independent variables are 
the several causes that contribute to this effect. We have only 
fragmentary information in regard to the identity of the causes 
contributing to the various effects in the field of education, but 
it appears that in some cases the number of independent vari- 
ables may exceed twenty. When the effect is a segment of pupil 
achievement, the causes are the factors that affect learning. 
These include not only such variables as intelligence of pupils, 
length of class period, and other traits and characteristics 
that are measurable in quantitative terms, but also such fac- 
tors as the textbook, method of teaching, and quality of super- 
vision. 

The experimental factor. Theoretically, any of the causes 
that contribute to the dependent variable may be studied experi- 
mentally to determine the effect of a specified change, but 
examination of a large number of representative experiments 
indicates that most experimental factors may be classified under 
a few heads A large number of studies deal with a specified 
variation in the learning exercises the pupils are asked to re- 
spond to. A few typical variations are: (1) in teaching read- 
ing in the primary grades, phonics versus no phonics; (2) in 
a laboratory science, individual-laboratory exercises versus 
lecture-demonstration exercises; (3) in arithmetic, one type of 

ent variable “ change m the rate of growth” be used This proposal is supported 
by theoretical arguments but certain difficulties are encountered in its practical 
application Courtis, S A “The Measurement of the Effect of Teaching,” 
School and Soaety, 28. 62-56, 84-88, July 14, July 21, 1928 

^ See bibliography at end of chapter for illustrations of a variety of dependent 
variables. 
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practice materials versus another type of practice exercises; 
(4) intensive reading versus extensive reading; (5) in spelling, 
requests to learn and apply rules versus requests to repeat the 
spelling of words until memorization has been attained. Many 
learning exercises relate to certain materials of instruction. 
Hence, the nature of the request is changed when the materials 
of instruction are varied. In the case of drill, the number and 
distribution of the exercises may be varied 

Motivation procedures and techmques designate another ex- 
perimental factor that has been studied by a number of research 
workers. This factor may be thought of as consisting of an un- 
ordered series of procedures and techniques designed to secure 
more intensive learning activity than would be stimulated by 
merely assigning learning exercises. T3q)ical variations are: 
(1) use of interesting reading materials versus the use of ma- 
terials judged to be less interesting, (2) short daily tests versus 
no such tests, (3) definite specific objectives versus general 
objectives, (4) information in regard to individual progress 
versus no such information, (5) group competition versus indi- 
vidual competition, (6) reproof versus commendation. 

Other factors that have been studied experimentally are 
procedures for directing learning, diagnosis and remedial m- 
struction, class size, length of class period, and classification of 
pupils. 

Requirements for successful experimentation. As imphed in 
the general description of experimental procedure on pages 274- 
75, the basic requirements for dependable findings are (1) selection 
of two or more equivalent groups of subjects, (2) maintenance 
of the specified status of the experimental factor in the experi- 
mental group and in the control group throughout the duration 
of the experiment, (3) control of the various non-experimental 
factors, (4) dependable measures of the dependent variable. 
Experimental studies will be successful to the extent that these 
requirements are met. A prerequisite for securing equivalent 
groups and for controlling the other non-experimental factors is 
identification of the pupil traits and other factors that contribute 
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to the dependent variable as defined by the problem. These 
traits and factors will vary with the nature of this variable, 
but a general consideration of the factors that contribute to 
pupil achievement will be helpful 

The factors contributing to pupil achievement. The research 
relative to the identification and potency of the factors that 
contribute to achievement is fragmentary and some of the find- 
ings do not appear to be highly dependable. However, the 
evidence relative to a number of factors appears convincing. 
As a means of eonvemence, the factors noted in the following 
pages are grouped under four heads. 

I Pupil traits. 

II Teacher factors 

III. General school factors. 

IV. Extra-school factors 

I. The significant pupil traits. In a particular case we are 
concerned only with those pupil traits that affect the achieve- 
ment specified by the problem Obviously, such characteristics 
as color of hair, degree of beauty, and height would not be in- 
cluded in any case On the other hand, general intelligence as 
measured in terms of a point score or of mental age, chrono- 
logical age, and previous achievement in the field of experi- 
mentation would appear m the list. In some cases, study 
habits, attitudes, and mterests and possibly other pupil traits 
should be included, but the evidence in regard to the contribu- 
tions of such factors is very fragmentary and not entirely 
consistent.^ Physical condition, except actual illness, seriously 


1 For example, SyxQonds m concluding “ a review of research on how to 
study ’’ states that “the commonly accepted rules of study are often non- 
consequential ” Symonds, P M “Methods of Investigation of Study Habits,” 
School and Society, 24 151, July 31, 1926, 

For information relative to pupil traits see 

Hemott, M E “Attitudes as Factors of Scholastic Success,” University of 
Illinois Bulletin, Vol 27, No 2, Bureau of Educational Research Bulletin, No. 47, 
XJrbana: Umversity of Illinois, 1929, p 31 
Gates, A I “A Study of Eeading and Spelling with Special Reference to 
Disability,” Journal of Educational Research, 6- 12-24, June, 1922. 

Chambers, O R “Measurement of Personality Traits,” Research Adventures 
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defective vision, or similar defects, and sex appear to be minor 
pupil factors.^ 

L There is abundant evidence that general intelligence, as 
measured by t37pical intelhgence tests, influences the achieve- 
ment of children. Many investigators have concluded that it is 
the most important factor.^ Hence, general intelligence (mental 
age, or test score may be placed at the head of the list of 
significant characteristics of pupil material. 

tn UmversUy Teaching. Bloomington, Illmois. Public School Publishing Com- 
pany, 1927, pp 71-80. 

Fleming, C W. "‘A Detailed Analysis of Achievement in the High School,” 
Teachers College^ Columbia Unwersity Contnhviwm to Mducation, No. 196. 
New York Bureau of Publications, Teachers College, Columbia University, 
1925. 209 pp 

Fryer, Douglas “Interest and Ability in Educational Guidance,” Journal of 
Educational Research, 16 27-39, June, 1927 

Ohmann, O. A. “A Study of the Causes of Scholastic Deficiencies in Engineer- 
ing by the Individual Case Method,” University of Iowa Studies in Education, 
Vol 3, No 7. Iowa City University of Iowa, 1927 58 pp 

Pressey, S. L “An Attempt to Measure the Comparative Importance of 
General Intelligence and Certain Character Traits in Contributing to Success in 
School,” Elementary School Journal, 21 220-29, November, 1920. 

^ For example see 

Hoefer, Carolyn, and Hardy, M C “The Influence of Improvement in Phys- 
ical Condition on Intelhgence and Educational Achievement,” Twenty-Seventh 
Yearbook of the National Society for the Study of Education, Part I. Bloomington, 
Ulinois Public School Publishing Company, 1928, pp 371-87 

Hall, Irene, and Crosby, Amy. “A Study of the Causes of Inferior Scholarship 
of Pupils m Low First Grade,” Journal of Educational Research, 14:375-83, 
December, 1926 

Mallory, J N “A Study of the Relation of Some Physical Defects to Achieve- 
ment m the Elementary School,” George Peabody College for Teachers, Contribu- 
tions to Education, No. 9 Nashville George Peabody College for Teachers, 
1922 78 pp 

Stalnaker, E M , and Roller, R D , Jr “A Study of One Hundred Non- 
Promoted Children,” Journal o/ EducatwrudResearch, 16 265-70, November, 1927. 

Westenberger, E J “A Study of the Influence of Physical Defects upon 
Intelligence and Achievement,” The Catholic JJmversily of America, Educational 
Research Bulletin, Vol 2, No 9 Washmgton* The Catholic Education Press, 
1927 53 pp 

2 For example, see Heilman, J D “Factors Determimng Achievement and 
Grade Location,” The Pedagogical Seminary and Journal of Genetic Psychology, 
36: 435-57, September, 1929. 

For a comprehensive account of the influence of general intelligence upon 
school achievement, see Terman, L M , et al “Nature and Nurture, Their 
Influence upon Achievement,” Twenty-Seventh Yearbook of the National Society 
for the Study of Education, Part II Bloomington, Illinois* Public School Publish- 
mg Company, 1928 397 pp 

® The IQ might have been listed as a pupil characteristic instead of mental 
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2. The significance of chronological age becomes apparent 
when a child having a mental age of twelve years and a chron- 
ological age of ten years is compared with one whose corre- 
sponding ages are twelve and fifteen. The first child has an 
IQ of 120 and the second one, an IQ of 80. Although the two 
children have equivalent mental ages, the first one is bright 
and the second is “dull The significance of chronological age 
is further shown by a comparison of two children of the same IQ 
but of different chronological ages. Although the children are 
equally “bright,’^ the difference in mental ages as well as the 
differences in physiological and social maturity emphasize the 
importance of chronological age as a factor contributing to 
school achievement An excellent discussion of the influence of 
chronological age, or the maturity of which it is an index, has 
been provided by Commms.^ 

3 Previous achievement ^ is a significant characteristic of the 
pupil material when it functions as a prerequisite for the learn- 
ing involved m the experiment. For example, ability to read 
functions as a tool in learning arithmetic, geography, history, 
hterature, and the like. Certain abilities in arithmetic and al- 
gebra function as tools in the study of chemistry, and achieve- 
ment in chemistry may contribute to achievement in physics 
Achievement in the jfirst year of a foreign language functions 
as a tool in the more advanced study of that language. It would 
be easy to enumerate a large number of cases in which abilities 
engendered in a school subject function later in the learning of 
that subject or related subjects. 

Abilities that function as a prerequisite for learning in one 
school subject may, or may not, be significant for learning in 
another school subject. For example, achievement in the first 

age, or intelligence test score When both mental age, or intelligence test score, 
and chronological age are separately considered, the pupil is more adequately 
characterized and the IQ is superfluous. 

^ Commms, W D. “ Maturity and Education,” EducaiwnaX Research Bulletin^ 
Vol 3, No 7, Catholic Umversity of America Washington Catholic IJmversity 
Press, 1928, p. 36 

® The total outcome of learning includes general patterns of conduct as well 
as specific habits and knowledge Among the possible outcomes are study habits 
which are not included here under the head of “ previous achievement.” 
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year of a given foreign language would be of more significance 
in an experiment in the second year of that language than it 
would be m an experiment in a different language. Achievement 
in the first year of a foreign language would probably be of 
negligible sigmficance in an expenment that involved learning 
typewriting as the dependent variable. The previous achieve- 
ment of children becomes of mcreasing importance as a factor 
m the achievement of the experiment in proportion to the extent 
to which the children have experienced subject-matter similar 
in content to that of the experiment. 

II. Teacher factors that affect pupil achievemenL Research 
has yielded little dependable information in regard to the 
teacher factors that contribute to pupil achievement.^ Hence, 
any list of teacher factors must be recogmzed as a hypothesis. 
Amount of traimng, teaching expenence, mtellectual status, 
and personality are usually listed as important teacher factors, 
but they influence pupil achievement for the most part in- 
directly through their contributions to more immediate factors. 
Hence, the foUowmg hst appears more useful.^ 

1 Instructional techniques 

a Devising and assigning learning exercises, 
b. Motivation procedures, 
c Directive procedures, 
d. Diagnostic and remedial procedures 

2. Classroom-management procedures. 

3 Skill in carrying out mstructional techniques and classroom- 
management procedures. 

4 Zeal of the teacher with reference to experimental factor. 

1 Corey, S M “The Present State of Ignorance about Factors Effecting 
Teacher Success/’ Educational Administration and Supervision^ 18 481-90, 
October, 1932 

^The authors are aware of the widespread conviction that the personality 
of the teacher is an important educative factor, but such traits as breadth of 
interest, self-control, good judgment, leadership, forcefulness, honesty, adapta- 
bility, enthusiasm, and open-mindedness contribute to the teacher’s instructional 
techniques and to his skill m the use of them and, hence, influence pupil achieve- 
ment indirectly The teacher’s personality may make a direct contribution, but 
since experimental evidence is lacking and we do not have satisfactory means 
for measuring personality traits, it does not appear wise to include them in a list 
of teacher factors to be considered in experimental investigations. For a study 
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1. The attention given to methods courses in the professional 
training of teachers is evidence of a conviction that the tn- 
structional techniques employed by a teacher affect the achieve- 
ment of pupils. This conviction is supported by some indirect 
evidence from investigations of the relation between the marks 
received by teachers in courses on methods of teaching and 
teaching success,^ Smce '^instructional techniques'^ is a general 
designation, a classification under four captions is suggested: 
(a) devising and assigmng learning exercises, (b) motivation pro- 
cedures, (c) directive procedures, (d) diagnostic and remedial 
procedures. Recognition of these rubrics will enable an ex- 
perimenter to be more certain m regard to the control of non- 
experimental factors under the general head of instructional 
techniques. 

2. Classroom-management procedures include such items as 
taking the roll, distributing and collecting materials, starting the 
work of the period, dismissing the class in case the pupils go to 
another room at the end of the period, and dealing with dis- 
ciphnary cases. The importance of these procedures is generally 
recognized. In fact, until recent years the teacher^s ability as a 
disciplinarian was considered to be the most important of his 
quahfications Although other aspects of teaching are now con- 
sidered of more importance than the mere maintenance of order, 
adequate attention to routine matters of classroom management, 
inclusive of discipline, is regarded as essential for securing an 
environment that will facihtate learning. If, however, distinctly 
undesirable practices are avoided, it appears likely that varia- 
tions m classroom-management procedures will not materially 
affect pupil achievement. 

3. The effectiveness of an instructional technique or a class- 
room-management procedure depends upon the shtll with which 

of the relationship of certain traits to teaching success, see Morns, E H “Per- 
sonal Traits and Success in Teaching,” Teachers College, Columbia University 
Contributions to Education, No 342 New York Bureau of Publications, Teach- 
ers College, Columbia University, 1929 75 pp 

^ Knight, P. B “Qualities Related to Success in Teaching,” Teachers College^ 
Columbia Univermty Contnhutions to Education, No 120 New York Bureau of 
Pubhcations, Teachers College, Columbia University, 1922, p. 42. 
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it is carried out. Although we have no means of obtaining precise 
measures of teaching skiU, it is obvious that some teachers are 
more skillful in carrying out certain instructional techniques 
than are other teachers. When a new technique is being com- 
pared with a famihar one, it is hkely that the new one will be 
apphed less skillfully. For example, suppose an experiment is 
devised to determine the effect of supervised study in com- 
parison with study without supervision. Suppose further that 
the plan of supervising study has been formulated in detail. 
If a teacher, w’'ho has become a skillful instructor under a plan 
that does not involve supervised study and who has not had 
experience in supervising study, attempts to teach one class 
employing supervised study and another without supervised 
study, it is reasonable to expect that he will be considerably 
more skillful in teaching the second class. If this is the case, the 
experiment would furnish a comparison betw^een skillful teach- 
ing without supervised study and teaching with super\ised study 
somewhat crudely carried out. Hence, the experiment w^ould 
not yield satisfactory evidence of the relative merits of skillful 
teaching with supervised study and skillful teaching mthout 
supervised study. 

It IS difficult to demonstrate the mfluence of teaching skill 
because we have no satisfactory means for measuring this 
teacher factor. However, it not infrequently happens that 
variation in teaching skill appears the most probable explanation 
of relatively large gains m achievement. In a recent study of the 
effect of increasing the number of weekly tests in teaching spell- 
ing from two to three, ^ the findings for one grade group showed 
an extraordinary superiority of the two-test plan. In attempting 
an explanation of this result, the authors suggest teaching skill 
as an influential factor. 

4. The zeal that a teacher exhibits in carrying out the in- 
structional techniques he is employing is a subtle factor. It is 
related to the factor of skiU, and perhaps the two overlap to 

1 Gates, A I , and Bennett, C C “Two Tests versus Three Tests Weekly in 
Teaching Spelling,” Elementary School Journal, 34. 44-49, September, 1933 
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some extent, but there is evidence that indicates the presence of 
an important educative factor that differs in some respects from 
skill. The influence upon pupil achievement of the teacher's 
preference ^ in regard to methods is indicated in the report ^ of 
an experiment to determine the rdative merits of instructional 
procedures that may be designated as Method A and Method B. 
Several teachers cooperated in the experiment, each teaching a 
class according to Method A and another class according to 
Method B. The following results were secured . 



Number 

Mean 

Score 

Mean 

Scholastic 

Grade 

Pupils taught by Method A 

417 

715 

83 9 

Pupils taught by Method B 

440 

69 5 

83 8 

Gam in favor of Method A 


2.0 


The teachers were asked to indicate which 

method 

they pre- 

ferred. The following results were obtained when the 
tabulated according to the preference of the teachers 

data were 

Teachers Preferring 




Method A 




Number 

Mean 

Score 

Mean 

Scholastic 

Grade 

Pupils taught by Method A 

131 

75.0 

84 8 

Pupils taught by Method B 

140 

59 3 

82 4 

Gam in favor of Method A 


15 7 


Teachers Preferring 




Method B 




Number 

Mean 

Score 

Mean 

Scholastic 

Grade 

Pupils taught by Method A . 

180 

68 2 

85 4 

Pupils taught by Method B . 

, 178 

72 2 

85 2 

Gam m favor of Method B 


50 



1 It IS reasonable to expect that a teacher will exhibit greater zeal when em- 
ploying a method that he beheves m than when employing one that he does not 
like 

^ Parr, R. M , and Spencer, M A “ Should Laboratory or Recitation Have 
Precedence in the Teaching of High-School Chemistry/* Journal of Chemical 
Education^ 7 571-86, March, 1930. 
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Teachers Having No 
Preference 



Number 

Mean 

Score 

Meant 

SCHOUABTIC 

Grade 

Pupils taught by Method A 

.. 80 

67 0 

82 7 

Pupils taught by Method B 

89 

67 2 

83.0 

Gam m favor of Method B . . 


02 



The differences between the mean scores of the several pairs 
of groups strongly suggest that the preference of the teachers in 
regard to the method of teaching affected the achievements of 
the pupils If it IS assumed that the preference m regard to 
methods affected the zeal of the teachers, it follows that this 
characteristic of teaching was an important educative factor. 
Several investigations ^ contribute evidence in support of this 
conclusion. 

III. General school factors that affect pupil achievement. Pupil 
achievement is affected directly or indirectly by several gen- 
eral school factors For example, it is generally assumed that 
the textbook used in a course influences the achievement of the 
pupils. Much of this influence is indirect For example, the 
character of the text influences the learning exercises assigned 
which m turn influence achievement. In the following hst of 
general school factors no attempt is made to indicate whether a 
factor functions directly or indirectly. 

1. Instructional materials (textbooks, hbrary, maps, laboratory 
apparatus, etc ). 

2. Time devoted to learning activity. 

3. Conconutant traimng. 

4. Size of class. 

5. Admimstration and supervision. 

1 Sexton, E. El , and Herron, J S “The Newark Phonics Experiment,” 
Mementary School Journal, 28. 690-701, May, 1928 

Collings, Ellsworth An Experiment with a Project Curriculum New York* 
The Macmillan Company, 1923 346 pp 

Pittman, M S The Value of School Supervision Baltimore Warwick and 
York, Inc , 1921 129 pp 

Kmght, F B “Qualities Related to Success m Teaching,” Teachers College, 
Columbia University Contributions to Education, No. 120 New York Bureau of 
Publications, Teachers College, Columbia Umversity, 1922, p 9 
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1. Instructional materials, such as textbooks, library, and 
other school equipment, influence the learning activity of pupils 
through the learning exercises that they furnish or make pos- 
sible. Texts in arithmetic, algebra, language, physics, and most 
of the other subjects furmsh a number of learmng exercises. 
Texts and other books make possible other learmng exercises, 
such as requests to study certain pages or questions whose 
answers may be found by reading. In a similar way charts, 
maps, moving picture machines, laboratory apparatus, and the 
hke, affect the number and type of learmng exercises that may 
be assigned Hence, the achievement of the pupils is hkely to be 
affected by the instructional materials used with a class. 

The intimate relation between instructional materials and 
learning exercises may make it impossible to have the former 
constant when the latter are greatly different. It should be 
noted, also, that certain types of learning exercises require 
certain instructional materials. Hence, if the purpose of an ex- 
periment is to compare two types of learmng exercises, such as 
the demonstration lecture and mdividual laboratory work, the 
materials must differ. In such cases, the difference in instruc- 
tional materials is essentially a phase of the experimental 
factor. 

2. If the time devoted to learning activity is assumed to be an 
index of the amount of exercise of modifiable connections, it is 
apparently an important educative factor.^ In group experi- 
ments the total time devoted to learmng activity is affected by 
absences, but the length of the class period and the number of 
minutes per day devoted to study are more important unless 
there is a marked difference m the number of absences m the 
two groups. The time devoted to study should include that 

1 This statement appears to be defensible even though research has revealed 
a low correlation between school attendance and achievement 

Odell, C W “The Effect of Attendance upon School Achievement,” Journal 
of Educational Research, 8 422-32, December, 1923. 

Denworth, K M “The Effect of Length of School Attendance upon Mental 
and Educational Ages,” Twenty-Seventh Yearbook of the National Society for the 
Study of Education, Part II Bloonungton, Illinois Public School Publishing 
Company, 1928, pp 67-91 
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denoted to thinking and talking about assignments as well as 
formal study, either at school or at home. 

3. Pupil achievement in a given field may be affected by 
concomitant training in other fields. If two fields are closely 
related, what is learned in one may be an asset for learning in the 
other. Even when the fields are not closely related, there may 
be a transfer of study habits.^ 

4 The 8ize of the class disappears as an educative factor in an 
experiment where equivalent groups are secured by pairing, 
since this procedure secures classes of equal size. If the two 
groups are not equal in size, small differences do not appear to 
be sigmficant because, within fairly wide limits, size of class does 
not appear to be an important educative factor.^ 

5. The administration and supervision of a school must be an 
important factor in learmng activity if the attention given to 
these fields in teacher-training institutions is any critenon. 
However, it is difficult, if not impossible, to find any quantitative 
evidence m regard to the contribution of this factor to class- 
room learmng The reason for this seems to lie in the fact that 
any influence exerted by administration or supervision must be 
an indirect one operating through the teacher, the course of 
study, the organization of classes, the provision of school equip- 
ment, and the hke 

IV. Extra-school factors that affect pupil achievement Pupil 
achievement may be affected by several factors that have not 
been included in the preceding lists. The following appear to 
deserve consideration: 

1. Participation m extra-class activities. 

2. The pupil^s home life. 

3. Community mterest m and attitude toward the school. 

1 For studies revealing transfer of this type, see 

Gatchel, D F “Results of a How-to-Study Course Given in High School,” 
School R&Giew, 39 123-29, February, 1931 

Hurd, A W. Problems of Science Teaching aZ the College Level, Minneapolis: 
University of Minnesota Press, 1929 195 pp 

Kornhauser, A W “ Changes in the Information and Attitude of Students in an 
Economics Course,” Jowrml of Educaiwruil Research, 22 288-98, November, 1930. 

2 See reference on page 318 to the research by Hudelson on class size as an 
educative factor 
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1. Pariiapation in extra’-class activities makes demands upon 
a pupil’s time and when much tune is devoted to such activities, 
the amount of learning outside of the class is hkely to be af- 
fected. Under wise supervision, however, participation in 
extra-class activities may be beneficial to learmng rather than 
detrimental ^ Dramatic, scientific, technical, and debating clubs 
not only add interest to the school subjects to which they are re- 
lated, but they may also contribute directly to achievement in 
certain fields 

2. The chiWs home life may influence his school achievement 
in many ways Listening to conversation of parents and other 
members of the family, reading periodicals and books that the 
home affords, and traveling with members of the family are 
activities that may contribute to school achievement by provid- 
ing a background of information for the learning that is to take 
place during the experiment. Topics in history, civics, biology, 
literature, and economics are more meaningful to the pupil who 
has had related experiences through travel. It is impossible to 
estimate the extent to which school achievement is influenced 
by these out-of-school experiences. 

Parental supervision of home study probably affects school 
achievement,^ but it is probable that the attitude of parents 
toward the school as an educative agency is a more potent in- 
fluence than any supervision they may admimster. 

3. School achievement is influenced by community interest 
in and attitude toward the school If the community is high in the 
socio-economic scale, the members of the community are hkely 
to show much interest in school affairs and to cooperate with the 
principal and teachers in attaining the best conditions for school 
work. For example, the parents of such a community may co- 
operate with the school faculty in providing more adequate 


1 Monroe, W. S “The Effect of Participation in Extra-Curricular Activities 
on Scholarship in the High School,” School M&oiew, 37 747-52, December, 1929. 

2 Reavis, W C. “ Some Factors That Determine the Habits of Study of Grade 
Pupils,” Elementary School Teacher, 12 71-81, October, 1911. 

Brooks, E C “The Value of Home Study under Parental Supervision,” 
Elementary School Journal, 17 184-94, November, 1916 



EFFECT OF CHANGE IN A CAUSE 


289 


library facilities. In other cases, the community may be per- 
meated with attitudes antagonistic toward the school and its 
administration. Such attitudes among parents tend to be ac- 
quired by pupils. Thus, commumty attitudes and interest may 
exert a subtle but possibly powerful influence on school learning. 

Details of setting up and conducting controlled experiments. 
Controlled experiments vary m details and hence what may be 
said relative to the techmque to be followed in one expenment 
may not be wholly applicable to other expenments. The discus- 
sion in the following pages assumes an experimental group and 
a control group, each of which may involve two or more sub- 
groups or classes. Except as explicitly indicated, the dependent 
variable is thought of as the average of a segment of pupil 
achievement and the experimental factor is one that affects 
learning. A person who attains an understanding of the pro- 
cedures discussed should not encounter serious difEculty in 
making adaptations to problems that involve other dependent 
variables and other experimental factors The discussion is 
organized under the following captions* (1) defining the prob- 
lem, (2) securmg a sample of school children representative of 
the population for which a conclusion is desired, (3) securing 
equivalent groups of pupils from this representative sample, 
(4) controlhng the other non-experimcntal factors dunng the 
period of experimentation, (5) conducting the experiment, 
(6) measuring the dependent variable. 

1. Defining the 'problem. In defining an experimental prob- 
lem, attention should be given to five points: (1) nature and 
scope of the dependent vanable, (2) nature and scope of ex- 
perimental factor and the change to be made in it, (3) status 
of non-experimental factors, (4) duration of the application of 
the change in the experimental factor, (5) population for which 
a conclusion is desired. 

The nature and scope of the dependent variable is frequently 
given only casual attention, but specification of these items is 
important. When the dependent vanable is pupil achievement, 
it may be the sum of all types of outcomes of learning or it may 
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be limited only to certain ones.^ For example, the outcomes of 
learmng activity m the field of history include additions to the 
pupiFs vocabulary, memorized facts, knowledge of events and 
their causes, points of view, attitudes, prejudices, interests, and 
study habits. In addition to the variations in definition sug- 
gested by this analysis, achievement in this field may differ with 
respect to topics studied (ground covered). The mere specifica- 
tion of ^^achievement in history'^ leaves the matter indefinite. 
If the change in the experimental factor is from heterogeneous 
to homogeneous grouping of pupils for instructional purposes, 
the dependent variable to be measured may be the amount of 
memorized information, total achievement, including attitudes 
and interests, average rate of progress (ground covered), or 
some other resultant of schoohng If the specified change is 
from the demonstration-lecture method to the individual- 
laboratory method, a number of dependent variables is possible, 
including the election of the science as a field of specialization. 
The choice of a test to measure the dependent variable implies 
specifications in regard to its nature and scope, but since indirect 
measurement is possible, the implications are uncertain. An 
experimental problem is not adequately defined until the nature 
and scope of the dependent variable are explicitly specified. 

The experimental factor and the variation to be made in it 
must be defined with precision in order that the consequent 
change in the dependent variable may be ascribed to a defimte 
cause For example, a conclusion that Method A is superior to 
Method B is not very meaningful if the investigator defines 
these methods only by saying that they are the methods carried 
out in the experiment. In some cases, the nature of the ex- 
perimental factor is such that it is relatively easy to give a 
precise definition of it. For example, Douglass compared the 


1 Although achievement is commonly thought of as outcomes of learning or 
the results of teaching, in most fields a large portion of what achievement tests 
measure is the same thing as is measured by general intelligence tests Hence, 
strictly speaking, achievement tests do not yield pure measures of the results of 
teaching This point is referred to again in considering the measurement of the 
dependent variable. 
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effectiveness of the recite-study sequence with the study-recite 
sequence where the divided penod plan is used ^ The reader of 
the report of the experiment has no dMcuIty in understanding 
the nature of the experimental factor and the change made in it. 

When the experimental factor is complex, as is the case in 
many problems, it is diiSacult to plan the investigation so that 
the experimenter will be able to ascribe the change m the de- 
pendent variable to a specific cause. For example, if the assign- 
ment method is being compared with the project method, the 
change m the experimental factor involves so many phases that 
the interpretation of the findmgs cannot be made very specific. 
On the other hand, if the procedures to be follow^ed in both the 
control and experimental groups are defined so that they differ 
in only one detail, abnormal or even unsound pedagogical condi- 
tions may be created and the conclusion will have only limited 
application. The dilemma suggested by these statements creates 
a serious difficulty in experimental research. Morrison has stated 
that it is ^^exceedingly difficult to raise an issue in the teaching 
process which is sufficiently defimte.^’ ^ When the experimental 
factor is complex, it may be possible to analyze the problem into 
a senes of more restricted problems The study of a single 
problem in this series will not have much practical significance 
and hence a person who plans a study of a complex problem 
should be prepared to deal with each of the restricted problems 
that the analysis may reveal. 

The procedures defined by the experimental factor should 
have a common function, and this function should agree with 
the specifications of the dependent variable. Two methods of 
teaching may not have identical fimctions. For example, the 
lecture method may be, and probably usually is, directed toward 
engendering information and what is commonly designated as 

1 Douglass, H R “The Experimental Comparison of the Relative Effective- 
ness of Two Sequences m Supervised Study,” University of Oregon Publications, 
Vol 1, No 4 Eugene University of Oregon, 1927, pp 173-218 

2 Morrison, H C “ The Major Lines of Experimentation in the Laboratory 
Schools,” Supplementary Educational Monographs, No 24 Chicago University 
of Chicago Press, 1923, p 5. 
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understanding. The class discussion method affords opportunity 
to engender ability to deal with thought questions, to develop 
ability in orgamzing arguments, and the like. Hence, these two 
methods may be conceived of m a form such that they have 
different functions. When this is the case, an experimental study 
of their relative effectiveness cannot be successful because the 
specification of a dependent variable compatible with one func- 
tion will operate to the disadvantage of the other. 

The point made in the preceding paragraph is important and 
recogmtion of the requirement of community of function of the 
procedures being compared would eliminate many ill-advised 
experiments. A critical examination of reports of experimental 
studies published since 1920 would reveal a surprising number 
in which the functions of the procedures compared are suffi- 
ciently dissimilar to make the inquiry not unlike a study of the 
relative effectiveness of two tools such as the hatchet and the 
saw. 

An experimenter determines the effect of the specified change 
for a particular status of the several non-experimental factors 
For example, a determination of the effect of a change in the 
number of minutes per day devoted to the teaching of spelling 
will be for pupils of a certain grade level and intellectual status, 
for a particular text, for a particular method of instruction, for a 
particular amount of incidental instruction, etc. Hence, in 
defining the problem there must be specifications relative to the 
status of the several non-experimental factors for which the 
determination is to be made. 

The necessity of restricting the experimental factor indicates 
that a problem frequently analyzes into a series of related 
problems. The specification of the status of the non-experi- 
mental factors suggests further analysis There is a problem for 
every variation in the status of each of the factors that functions 
as a cause of the dependent variable. Hence, the definition of a 
problem that appears to ask only a single question, may reveal 
an extended series of questions, each of which requires experi- 
mental study The meaning of this statement will become more 
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apparent to the reader if he will attempt to define a few typical 
problems. 

The effect sought is one resulting from the operation of the 
specified change in the experimental factor over a period of 
time. For example, in investigating the effect of changing the 
daily time allotment for the teaching of spelling from ten minutes 
to thirty minutes, we seek the effect of this change not for a 
single day or a single week, but for a semester or year or even a 
longer period. Hence, in defining the problem, the period during 
which the change is to be operative should be specified. Some- 
times the effect of the change may be of such a nature that it is 
not observable until the change has been operative for a con- 
siderable period For example, when homogeneous grouping 
versus heterogeneous grouping is made the basis of experimenta- 
tion, it is conceivable that the effect upon pupil attitudes may 
not be apparent at the end of one semester, and the experimenter 
should be interested in the effect of this change in the grouping 
of pupils over a period of years. In many cases a change in 
method of teaching will not become fully effective until the 
pupils become acquainted with the new method and the teacher 
becomes skillful in its application. 

The effect may be sought for a single pupil or a small group of 
pupils, but usually the researcher will be interested in the aver- 
age effect for a large group or umverse. Data are necessarily 
collected from a particular population, but we are mterested in 
the determination primarily with reference to its apphcation to 
other similar groups. In other words, we usually desire a gen- 
erahzed conclusion. Hence, the problem should be thought of 
as that of determining a generalized conclusion in regard to the 
average effect of a specified change in the experimental factor 
operating during a specified period and for a defined status of 
the other non-experimental factors. 

2 Securing a sample of school children representative of the 
population for which a conclusion is desired. Application of the 
findings for a given group to another group requires that the 
investigator secure a group of school children that is sufficiently 



294 STUDY OF EDUCATIONAL PEOBLEMS 

representative to justify the generalization. One means of ap- 
proximating representative sources of data in educational re- 
search is by the method of random samphng, but this techmque 
is seldom feasible in experimental investigations, since the mem- 
bers of a random group would be scattered among the total 
population or universe and it would be difficult if not impossible 
to bring the several pupils together for the experimental instruc- 
tion Usually the investigator can only select a group that is 
judged to be representative. 

It IS contended by Lindquist ^ that in many evaluations of 
instructional methods or materials some variation from repre- 
sentativeness of groups will not necessarily invalidate generahza- 
tions with respect to the effectiveness of the methods or materials 
compared. It is evident, however, that such variations cannot 
be permitted to be very great Conclusions derived from groups 
of bright pupils may not safely be applied to average or dull 
pupils. The data obtained from dull pupils may not safely be 
used as a basis for generahzations which are to apply to average, 
or bright pupils. In the reports of the supervised study experi- 
ments of Breed ^ and of Brcslich ^ it is stated that the duller 
pupils profited most, while the bright pupils were not helped, and 
in some cases, were handicapped. If one of the experimenters 
had used only bright pupils and the other dull ones, the conclu- 
sions would be in opposition even though each may be a de- 
pendable basis for a generahzation restricted to the population 
represented 

Large samples are likely to be more representative than small 
ones Hence, an experimenter should endeavor to secure rela- 
tively large groups for his study No mimmum number of 
pupils can be specified In an experiment to determine the 
effect of variations in size of class there should be several hun- 
dred pupils. In a study to determine the relative merits of two 

1 Lindquist, E F “The Standard Error of the Means of * Matched ’ Samples,” 
Journal of Educational Psychology, 22 197-204, March, 1931 

® Breed, F S “Measured Besults of Supervised Study,” School Review ^ 
27 186-204, 262-84, March, April, 1919. 

^Breslich, E B “Teaching High School Pupils How to Study,” School 
Remew, 20 505-15, October, 1912. 
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instructional procedures a smaller number may be satisfactory. 
In many cases two or more parallel experiments are more de- 
sirable than a single large group experiment. For example, if the 
resources of the experimenter permit including several hundred 
pupils in a study of the relative merits of two instructional 
procedures, a group of experiments should be planned rather 
than one large expenment. 

3 Securing equivalent groups of pupils.'^ The control of pu- 
pil factors IS accomplished by securing a control group that is 
equivalent to the experimental group. Complete equivalence is 
secured by matching the pupils in the two groups with reference 
to the traits that affect the dependent variable of the experiment. 
Precise matching with respect to more than one or two traits 
materially reduces the size of the groups because perfect mates 
cannot be found for many of the pupils. Hence, usually the 
matching is on the basis of one or two measures such as in- 
telhgence test score, ^ or initial achievement.^ Occasionally 
pupils are matched on the basis of a composite score.^ Olander ^ 
paired pupils chiefly on the basis of their growth in arithmetical 
ability during a prehmmary period in which all pupils were sub- 
jected to the same or similar instruction. Courtis ^ has proposed 
a technique somewhat similar to that of Olander, but which is 

1 For a more comprehensive discussion of the techniques used m securing 
equivalent groups, see Engelhart, M D “Techniques Used in Securing Equiva- 
lent Groups,” Journal of Educational Research, 22 103-09, September, 1930. 

2 The following experiments are illustrations of this technique 

Anibel, F G “ Comparative Effectiveness of the Lecture-Demonstration and 
the Individual-Laboratory IMethod,” Journal of Educational Research, 13. 356, 
May, 1926 

Ullrich, O A “The Effect of Required Themes on Learning,” Journal of 
Educational Research, 14 296, November, 1926 

3 Burks, J D , and Stone, C R “ Relative Effectiveness of Two Different 
Flans of Training in Silent Reading,” Elementary School Journal, 29 433, Feb- 
ruary, 1929 

^ Douglass, H R “The Experimental Comparison of the Relative Effective- 
ness of Two Sequences m Supervised Study,” University of Oregon Publications, 
Vol 1, No 4 Eugene University of Oregon, 1927, pp 173-218. 

® Olander, H T “Transfer of Learmng m Simple Addition and Subtraction,” 
Elementary School Journal, 31 358-69, 427-37, January and February, 1931 

® Courtis, S A “Criteria for Determmmg Equality of Groups,” School and 
Society, 35 874-78, June 25, 1932 

See also Courtis, S A “ Maturation Units for the Measurement of Growth,” 
School and Society, 30 1683-90, November 16, 1929. 
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more refined. If it is not feasible to classify the pupils into 
matched groups at the beginmng of the experiment, matched 
groups may be selected when computing the mean gains. This 
procedure is likely to result in a material decrease in the number 
of cases but it has the advantage of bemg applicable to regular 
classes. 

Several experimenters have considered the groups to be 
sufficiently equivalent when the means of the measures of the 
trait or traits were equal. Some have sought also equahty of 
measures of variability.^ Although the equivalence of the 
groups secured in these ways is not as precise as that obtained 
by means of matched pairs, it is hkely that reasonably satisfac- 
tory control of pupil factors is attained, especially when the 
variabilities of the groups are considered A practical advantage 
of the procedure is that the labor of handling the data is greatly 
reduced When it is not feasible to secure equivalent groups, 
Melby and Lien ^ have proposed the use of three or more regular 
classes without modification. Intelligence and achievement 
tests are administered to determine the initial status of these 
classes. The experimental procedure is then applied to the class 
whose initial status is superior to some of the classes, but is 
inferior to the others. If the final achievement of this class is 
superior to the final achievement of the initially superior class, 
or classes, then it may be argued with considerable justification 
that the method represented by the experimental procedure is 
relatively superior to that employed with the imtially superior 
class If, on the other hand, the final achievement of the class 
to which the experimental procedure was apphed is below that 
of the imtially inferior class, or classes, it may be argued that the 
method represented by this procedure is relatively inferior. 

In a few specialized types of experiments the groups may be 


1 This technique is exemplified in Brooks, F D. “The Transfer of Training in 
Relation to Intelligence,” Journal of Bducatwnal Psychology, 16 415, September, 
1924 

2 Melby, E 0 , and Lien, Agnes. "A Practicable Technique for Determining 
the Relative Effectiveness of Different Methods of Teachmg,” Journal of Edu^ 
eational Research, 19 255-59, April, 1929. 
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selected by a process of random sampling. In a study by the 
semor author ^ to ascertain how pupils solve arithmetical 
problems, four groups were secured by havmg four tests ar- 
ranged m alternate order before they were distributed to pupils 
as they were seated m the classrooms The four groups selected 
in this way were large and the assumption of equivalence seems 
to be justified 

Reeder ^ sought to secure eqmvalence by employing a rotation 
technique which involved mterchangmg the expenmental and 
controlled groups at the muddle of the experimental period This 
procedure really divides the investigation into two sub-experi- 
ments The data are organized so that the gains in achievement 
of both groups of pupils under the influence of the experimen- 
tal procedure may be compared with the gains in achievement 
of both groups under the mfluence of control procedure. 
With reference to the total investigation this technique gives 
two groups which are equivalent in the sense that they are made 
up of the same pupils. However, the two groups are not neces- 
sarily equivalent when considered with reference to achievement 
in the field of experimentation, study habits, and possibly some 
other factors. If the problem is to determine the relative effects 
of procedures for directing study, a carry-over of study habits 
is to be expected in the case of the pupils who receive such 
instruction during the first half of the experimental period. A 
similar carry-over would, of course, be impossible in the case 
of the other group. Hence, this rotation technique would fail 
to secure groups equivalent with respect to study habits even 
though they consist of the same pupils. 

The most precise equivalence will usually be attained when 
the groups consist of matched pairs, but when such a procedure 
is not feasible or is judged xmdesirable, one or the other tech- 

1 Monroe, W S “ How Pupils Solve Problems in Arithmetic,” Universzty of 
Illinois Bulletin, Vol 26, No 23, Bureau of Educational Research Bulletin, No 44 
XJrbana Umversity of Illinois, 1929 31 pp 

^^Eeeder, EH “A Method of Directing Children's Study of Geography,” 
Teachers College, Colundna University Contnbvtwns to Education, No. 193 
New York Bureau of Publications, Teachers College, Columbia University, 
1925 98 pp 
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niques may be employed. If the equivalence is to be judged 
only with reference to means and standard deviations, a proce- 
dure suggested by Rulon and Croon ^ may be employed. When 
the groups of the experiment are not equivalent, it may be 
possible to calculate the effect of the lack of control of the pupil 
factors,^ or, as suggested by Melty and Lien, dependable inter- 
pretation may be possible in spite of the lack of equivalence. 

4. Controlling the other non-expenmental factors. This phase 
of the experimental procedure introduces the assumptions 
that control of non-expenmental factors is possible and that it 
can be accomplished without creating conditions which are 
significantly abnormal or pedagogically unsound. These assump- 
tions appear reasonable in the case of a number of expenmental 
problems, but in other cases their validity is not certain. When 
the experimental factor is related to certain non-experiment al 
factors so that a change in the former requires changes in the 
latter, the assumptions can be satisfied only by analyzmg the 
experimental factor and planning a series of experiments. 

A non-expenmental factor of considerable importance is 
that of the zeal or effort of the teacher.® Preference for a method 
because of its novelty or because it is a current fad m education 
or because it is advocated by persons occup 3 dng positions of 
prominence is apt to stimulate a teacher to greater zeal m apply- 
ing it than that with which the pupils of the other group are 
taught. Some degree of control of this subtle factor may be 
secured by carefully prepared instructions to the teachers, 
especially those having experimental groups, and by endeavoring 
to engender in the teachers a scientific attitude toward the in- 
vestigation A contribution to the control of the zeal of the 

1 Rulon, P J, and Croon, C W “A Procedure for Balancing Parallel 
Groups,” Journal of Educational Psychology, 24 585-90, November, 1933 

2 For illustrations see 

Haefner, Ralph “Casual Learning of Word Meanings,” Journal of Educa^ 
twnal Research, 25 267-77, April-May, 1932 

Westfall, L H. “A Study of Verbal Accompaniments to Educational Motion 
Pictures,” Teachers College, Columbia University Contributions to Education, 
No 617 New York Bureau of Pubhcations, Teachers College, Columbia 
Umversity, 1934 67 pp 

® See pages 283-85 
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teacher may be made by visiting the classroom, observing the 
teaching, and commenting to the teacher m a later conference on 
the extent to which her teachmg appeared to be suitably zealous. 

The skill of the teacher is another important non-expeiimental 
factor. IS there is reason for beheving that the teachers may not 
be equally skillful m instructing the expenmental and control 
groups, a period of practice is desirable. 

Frequently, control of non-experimental teacher factors is 
attempted by having the same teacher instruct both an experi- 
mental and a control group The success of this technique is 
dependent upon the teacher being equally skillful and equally 
zealous in instructing the two groups This condition may pre- 
vail but the teacher may prefer either the experimental proce- 
dure or the control procedure or carry them out vuth different 
degrees of skill In any case a requisite for highly effective 
instruction is that the teacher beheve that she is employing a 
good procedure Hence, having an experimental group and a 
control group taught by the same teacher does not insure 
adequate control of teacher factors. 

A method of rotation has also been employed as a means of 
neutrahzing possible variations in teacher factors. According 
to this plan, at the middle of the expenmental period a teacher 
who has been instructing an expenmental group exchanges with 
one who has been teaching a control group. ^ Thus, it is argued 
that any difference in zeal or skill on the part of the two teachers 
will be neutralized by the fact that both the expenmental group 
and the control group have received an equal amount of stimula- 
tion and direction from each teacher. This procedure will be 
successful in securing equivalence of these factors only when the 
two teachers are equally skillful and equally zealous in carrying 
out the procedures prescribed for the two groups. A teacher 
may be equally skillful and equally zealous in carrying out 
different procedures, but it appears likely that most teachers 
because of their lack of familiarity with or a dislike for one of the 

It should be noted that this exchange involves also a change of the teacher 
relative to the experimental factor 
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procedures will teach with less skill and zeal in one of the groups 
than in the other When this occurs the rotation procedure will 
not succeed m securing control of skill and zeal except by 
chance. 

Another important non-experimental factor is the amount 
of time spent by the pupils in learning activity. The two groups 
of pupils should spend equal amounts of time m study and recita- 
tion One of the best techmques that has been suggested for 
the control of learning time is that of providing teachers with 
detailed instructions in regard to assignments and the supervi- 
sion of learning activity ^ However, even wisely prepared 
instructions will not insure control of this factor, especially 
when the pupils are expected to participate in learning activity 
outside of the classroom Furthermore, attempts to control 
learning time are likely to create artificial conditions that are m 
opposition to principles of good teaching. 

When instructional techniques are being compared, it is 
important that the instructional materials be the same for both 
the experimental and control groups, and when instructional 
materials are being compared it is important that the instruc- 
tional techniques be the same for both groups. Instructional 
techniques and materials of instruction, however, are closely 
related. It is impossible to have the latter constant when the 
former are greatly different. For example, certain types of 
learning exercises require certain types of materials of instruc- 
tion If the purpose of the experiment is to compare two types 
of learning exercises, such as demonstration-lectures and 
individual-laboratory work, the materials must differ. In such 
cases, the difference in materials of instruction is essentially a 
phase of the experimental factor. 

5. Conducting the experiment The preceding discussion 
of the control of non-experiment al factors has included some 
reference to conducting the experiment. There are, however, 

1 For an excellent illustration of the use of such plans, see Coryell, N. G. “An 
Evaluation of Extensive and Intensive Teaching of Literature,” Teachers CoU 
lege, Columbia UmversUy Coninbutions to Education^ No 275 New York: 
Bureau of Publications, Teachers College, Columbia University, 1927, p. 13 
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certain points that should be emphasized Wise planning is 
essential but not sufficient. The plans must be carried out. 
When several teachers are involved, the experimenter should 
prepare relatively detailed instructions for them to follow If 
feasible, the teachers should be brought together for a conference 
at which the instructions are explained. Unless each teacher is 
instructing both an experimental group and a controlled group, 
separate conferences should be held with the two groups of 
teachers. In any case the instructions should be reduced to 
writing and a copy given to each teacher. 

It is desirable that the investigator keep in close touch with 
the work throughout the duration of the experiment. Even 
when the instructions have been wisely formulated, conditions 
may arise such that strict conformity with them may not be 
compatible with good teaching. If possible, such cases should be 
reported to the investigator who should make the decision in 
regard to the procedure to be followed As a means of obtaining 
a record of what occurs day by day, each teacher may be asked 
to keep a diary. If possible, the experimenter should \asit each 
class a number of times. Information relative to details of 
instruction may be very useful in interpreting the experimental 
findings. For example, Brownell ^ has reported an experiment 
in which the interpretation of certain findings was materially 
changed when certain instructional details were recalled 

6 Measuring the dependent variable When an experimental 
problem is adequately defined, the nature and scope of the 
dependent variable will be specified The research worker faces 
the task of selecting or devising an instrument to measure the 
specified variable, or more specifically the change in it. Usually 
this instrument will be an achievement test. In selecting or 
devising a test for use in an experimental study, testing time, 
convenience in scoring, and expense per pupil are minor con- 
siderations. Since the effect of vanable errors upon a mean 
varies inversely as the square root of the number of cases, the 

1 Brownell, W. A “An Evaluation of an Arithmetic ‘Crutch/” Journal of 
Experimental Educaiion, 2 5-34, September, 1933. 
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reliability of the instrument and its validity, as measured by the 
correlation of the test scores with a criterion, are of secondary 
importance. Systematic errors of measurement are controlled 
through the administration of the test and other aspects of 
testing conditions. Hence, the principal consideration in select- 
ing or devising a measuring instrument for use m an experiment 
is that the systematic errors of validity be a minimum. 

Current practices in experimental studies suggest that an 
objective test is essential. It is contended that an objective 
test measures the same thing as an essay examination within 
the same field ^ and that the measures are more accurate meas- 
ures because the scoring of the essay examination is subjective. 
Studies of the marking of examination papers exaggerated the 
unreliability of examination grades ^ Furthermore, it has been 
shown that by exercising care in formulating the questions 
and by formulating rules for scoring, differences between the 
grades assigned by different teachers can be greatly reduced ^ 
Hence, the increase in reliability attained by using an objective 
test is much less than is commonly believed. 

The evidence relative to the community of function of 
objective tests and essay examinations is not necessarily con- 

^ For example, see Wood, B D Measurement in Higher Eduaition Yonkers- 
on-Hudson World Book Company, 1924 
Paterson, D G. “ Do New and Old Type Examinations Measure the Same 
Functions?” School and Society^ 24 246-^8, August 21, 1926 

Corey, S. M. “The Correlation between New Type and Essay Examination 
Scores and the Kelationship between Them and Intelligence as Measured by 
Army Alpha,” School and Society, 32 849-50, December, 1930 
Gilliland, A R , and Misbach, L E “Relative Values of Objective and Essay 
Type Examinations in General Psychology,” J oumal of Educational Psychology, 
24 349-61, May, 1933. 

2 Monroe, Walter S , and Souders, Lloyd B “The Present Status of Written 
Examinations and Suggestions for Their Improvement,” University of Illinois 
Bulletin, Vol 21, No 13. Bureau of Educational Besearch Bulletin, No 17. 
Urbana, University of Illinois, 1923 77 pp 

®Osburn, W. J “Testing Thinking,” Journal of Educational Research, 
27 401-11, February, 1934 

Peters, C. C , and Martz, H B “A Study of the Validity of Various Types of 
Examinations,” School and Society, 33 336-38, March 7, 1931 
Sims, V M “Improving the Measuring Qualities of an Essay Examination,” 
Journal of Educational Research, 27*20-31, September, 1933 
Stalnaker, J M , and R C “Reliable Reading of Essay Tests,” School 
Remew, 42 599-605, October, 1933. 
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vincing/ especially when the requirements of an experiment 
are considered. In such studies the purpose is to secure a 
measure of the change in achievement rather than of the status 
of achievement, and a test that is highly vahd for measuring 
the latter may be a poor instrument for measuring the change 
in achievement.^ Wiedemann and Neivens ^ reported that a 
true-false test measures approximately 60 per cent of the same 
thing as a **compare-and-contrast^’ essay test.^ In view of 
the fact that in most fields achievement as measured includes 
what we caU general intelligence as a large factor, it appears 
that the community of fxmction of these two tests exclusive 
of this factor is materially less. Hence, when considered with 
reference to change in achievement, the two types of instru- 
ment should not be accepted as measuring the same thing. 
The change in achievement resulting from instruction will 
be in the factor that is not general intelligence. Hence the 
difference between mean score at the beginning of an experi- 
ment and that at the end is heavily weighted by a factor that 
is not influenced by instruction. At best the difference between 
the mean scores probably minimizes the growth due to instruc- 
tion. The measurement by means of an objective test is likely 
to be indirect, and it is this condition that creates the possi- 
bility of errors of validity. Measurement by means of an essay 
examination can be and usually is much more direct. Hence, 
it seems justifiable to conclude that in many cases an essay 


^ For a strong argument on this point, see Cason, Hulsey. “The Essay Ex- 
amination and the New Type Test,” School arid Society, 34 413-18, September 
26, 1931. 

2 For some evidence on this point, see Watson, Goodwm “Note on Validity 
in the Measurement of Change,” Journal of Educatwml Research, 27 187-92, 
November, 1933 

® Wiedemann, C C , and Neivens, Xi. F “Does the ‘Compare-and-Contrast’ 
Essay Test Measure the Same Mental Functions as the True False Test^” 
Journal of General Psychology, 9* 430-49, October, 1933 

si m ilar degree of commumty of function is reported for the “discuss*’' 
essay test and simple fact answer test and for the “explain” essay test and "word- 
answer test 

Cochran, R E , and Weidemann, C C “ ‘Explain* Essay vs Word- Answer 
Fact Test,” The Phi Delta Kappan, 17 59-61, December, 1934 
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examination will be a superior instrument for measuring gain 
in achievement. 

In devising a test for use in an experiment, an attempt 
should be made to formulate exercises that will call for the func- 
tioning of the abilities or traits specified as the dependent 
variable. It is relatively easy to approximate direct measure- 
ment of the more specific abilities such as motor skills and 
fixed associations. It is more difficult to secure satisfactory 
measures of knowledge achievement and generalized controls 
of conduct such as skill in reflective thinking, attitudes, ideals, 
and interests. The real test of a pupiFs knowledge achievement 
is his ability to deal with difficulties and new situations. Hence, 
an instrument for the measurement of knowledge achievement 
should consist of thought questions. The test of general pat- 
terns of conduct is the consistency of conformity to the pat- 
tern Hence, a single formal test cannot be expected to furnish 
adequate evidence of such achievement The measurement of 
the acquisition of the study habits resultmg from certain types 
of instruction has been attempted by measuring the knowledge 
achievement at the end of the experimental period. The real 
test of the acquisition of study habits is to be found in the con- 
formity of the students to the procedures after the completion 
of the period of instruction. 

Systematic errors of validity are due to fluctuations in the 
ratio of the mean of what is measured directly to the mean of 
the abilities or traits whose measurement is desired. Hence, 
when indirect measurement cannot be avoided, the experi- 
menter should endeavor to select or devise an instrument such 
that this ratio will be the same for the control group as for 
the experimental group This ratio will not be the same if the 
test favors either group. Usually it is not possible to secure ob- 
jective evidence relative to this point and hence the experimenter 
must rely upon his judgment. A coefficient of validity cannot 
reveal the presence of a systematic error. Furthermore, it 
should be emphasized that a test or type of test that is shown 
to be satisfactory for one purpose is not necessarily equally 
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satisfactory in another situation. Usually the requirements 
of an experiment are highly specialized, and hence a test that 
IS reported as satisfactory for a general survey of pupil achieve- 
ment or for some other purpose may be a very poor instrument 
for measuring the dependent variable of an expermient 

Handling experimental data.^ The usual method of handling 
the data obtained from a controlled experiment involving two 
groups involves no elaborate statistical procedures The 
measurement of the dependent variable at the beginning and 
end of the experimental period yields two sets of data for each 
group of pupils One procedure for handhng these data is to 
calculate the following means: 

Mci — mean of first measures of control group 
Mc2 = mean of second measures of control group 
Mei = mean of first measures of experimental group 
Me2 = mean of second measures of experimental group 

The gain in the dependent variable is found by subtractmg 
the mean of the finst measure from that of the second ^ 

Gc = Mc2 “ Mci = mean gam of control group 
Ge = Me2 Mei = mean gain of experimental group® 

The experimental difference, D, is obtamed by subtracting 
Go from Ge- 

When the initial status of the dependent variable can be 
assumed to be zero, no test is administered at the beginmng of 
the experimental period and 

D = Me 2 ~ ^C 2 

The experimental difference, D, if dependable, represents the 
effect of the specified change in the experimental factor. 

^ The plan described here is sometimes supplemented by the application of 
other techniques Descriptions will be found m the illustrative references at the 
end of the chapter 

2 It is necessary that the measures be comparable The initial and final tests 
should represent equivalent forms If they are not equivalent forms, but are 
equally valid with respect to the experimental achievement, conversion of the 
imtial and final measures into standard scores, T-scores, age scores, or grade 
scores makes possible the calculation of mean gams See pages 82 f 

3 These gams may also be obtamed by calculatmg the gams of individual 
pupils and averaging them 
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B Inteepeetation of Experimental Findings 

Determining the dependability of an experimental difference 
by verification. An experimental difference is to be regarded as 
dependable when repetitions of the study yield similar dif- 
ferences This statement suggests that an experimenter should 
endeavor to establish the dependability of his findings by re- 
peating the study. Although this method may not often be 
feasible, it is to be recommended. Sometimes it is possible for 
the investigator to plan a group of similar experiments rather 
than a single large one Barr ^ has reported a suggestive illus- 
tration A total of sixty-four subjects were involved in the study 
Four similar experiments were conducted, each experimenter 
working with sixteen subjects. Comparison of the differences 
from the four experiments indicates their dependability. Al- 
though an experiment with as few as sixteen subjects is seldom 
to be commended, our confidence in the dependability of the 
reported differences is increased by the method employed 

Estimating the dependability of an experimental difference 
for the population of the experiment. When verification by rep- 
etition of the experiment is not feasible, a judgment m regard 
to the dependability of the obtained difference may be arrived 
at from a critical examination of the experimental procedure 
and the data collected The reasoning involved is similar to 
that described for comparative surveys in the preceding chapter. 
In fact, an experiment is a comparative survey of selected 
populations which have been subjected to controlled educative 
influences 

The calculated difference represents the effect of the change 
made m the experimental factor plus the effects of the faults 
of the data and of failure to control completely the non- 
experimental factors Hence, the problem is to determine 
whether the combined effect of imperfect control of the non- 

1 Barr, AS “A Study of tEe Amount of Agreement Found in the Results of 
Four Experimenters Employing the Same Experimental Technique m a Study 
of the Effects of Visual and Auditory Stimulation on Learning,” Journal of 
Educcd%onal Research, 26 35-45, September, 1932 
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experimental factors and of data faults has been sufficient to 
give the obtained difference a sign opposite to that of the net 
difference. The obtained difference is regarded as dependable 
when it appears that the net difference would have the same 
sign. The reader should not confuse dependability with prac- 
tical significance. An experimental difference is calculated 
from means and the limitations of averages are apphcable. 
Furthermore, a very small net difference means that the change 
in the experimental factor has little effect. 

The effect of failure to control non-experimental factors 
cannot be calculated, but an experienced investigator will 
usually be able to identify those non-expenmental factors 
whose control is important in a particular experiment. The 
effect of inadequate control can only be estimated but usually 
it may be possible to make a dependable determination of its 
sign. General school factors and extra school factors should be 
critically examined for lack of control, but usually the most 
important variations in non-expenmental factors are to be 
found in those that relate to the teacher. Teaching zeal and 
skill are especially difficult to control, and differences in them 
may make a sigmficant contribution to the obtained difference. 

Of the four types of error, variable errors of measurement 
and variable errors of validity are not likely to affect the dif- 
ference very much, since the effect of these errors upon a mean 
is inversely proportional to the square root of the number of 
cases and a portion of this effect is hkely to cancel out in the 
subtractions. When the test does not measure directly all 
phases of the specified pupil achievement, the systematic effect 
may be a matter of considerable importance. For example, 
experiments designed to deternune the relative effectiveness 
of the individual-laboratory and lecture-demonstration methods 
of teaching a science have been criticized by pointing out that 
the test used did not measure directly some of the outcomes 
claimed for individual laboratory work, and hence favored the 
lecture-demonstration method 

If, as the result of the experimenters^ consideration of the 
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control of non-expenmental factors and data faults, it appears 
likely that the net difference has the same sign as that of the ob- 
tained difference and does not approximate zero, the calculated 
difference is designated as dependable On the other hand, if 
it appears at all likely that the net difference has the opposite 
sign, the obtained difference must be labeled as lacking in de- 
pendability. Frequently it is not possible to make a dependable 
judgment with respect to the sign of the net difference In such 
cases the dependability of the obtained difference should be 
regarded as uncertain 

Estimating the dependability of the obtained difference when 
considered with reference to a larger population or universe. 
The obtained difference is for the population included in the 
experiment Usually we are interested in the net difference 
for a larger population or universe. If the experimental popula- 
tion IS not representative, this condition may contribute to the 
obtained difference and hence affect its dependabihty when 
considered with reference to the larger population or umverse. 
It is a common practice to calculate the probable error of the 
difference between the mean gams of two groups by means of 
the formula ^ 

PEd = 4 / PE Ml + PE Ml 

The result of the calculation is then compared with the ob- 
tamed difference. McCall ^ has proposed a plan of handling 

D 

experimental data which culminates m the ratio — winch 

Z io<J 

he calls the experimental coefficient {EC). The reader is then 
told that an experimental coefficient of 1 00 means that we 
can be practically certain that the true difference is somewhere 
above zero.’^ Although m another chapter McCall considers 

^ The corresponding formula for the standard error of the difference is 

Sometimes the long formula (see page 105) is used, but almost never with 
adequate recognition of the assumptions and implications involved in its use 

2 McCall, W A. How to Expenmerit in Education New York The Macmillan 
Company, 1923, pp 140, 155. 
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the control of non-experimental factors, he gives the impression 
by this statement that when the experimental coeflScient is 
found to be 1.00 or greater, the experimenter is justified in 
concluding that the nei difference for the umverse has been 
demonstrated to have the same sign as the obtamed difference. ^ 

This procedure is subject to two criticisms. In the first 
place the use of the probable or standard error formulae implies 
that the experimental group and the control group are inde- 
pendent random samples of the universe. Random sampling 
is seldom feasible in selecting pupils for an experiment and if 
the two groups are chosen so they are equivalent, the samples 
cannot be independent. Lindquist and Wilks have proposed 
formulae for use when the groups have been matched * The 
use of these formulae, however, is limited. 

1 The ratio called the critical ratio {CR) , is sometimes used A value of 

4 00 represents approximately the same situation as a value of 1 00 for the ex- 
perimental coefficient 

2 Wilks, S S “The Standard Error of the Means of Matched Samples,” 
Journal of Educational Psychology, 22 205-08, March, 1931 

Lindquist, E. F. “The Significance of a Difference between ‘Matched* 
Groups,” Journal of Educational Psychology, 22 197-204, March, 1931 

See also 

Ezekiel, Mordecai “ ‘Student’s* Method for Measurmg the Sigmficance of 
a Difference between Matched Groups,” Journal of Educational Psychology, 

23 446-50, September, 1932 

Lindquist, E F “A Further Note on the Significance of a Difference be- 
tween the Means of Matched Groups,” Journal of Educational Psychology, 

24 66-69, January, 1933 

Ezekiel, Mordecai “Reply to Dr Lmdqmst’s ‘Further Note* on Matched 
Groups,” Journal of Educational Psychology, 24 306-09, April, 1933 

Peters, C C , and Van Voorhis, W R “A New Proof and Corrected Formulae 
for the Standard Error of a Mean and of a Standard Deviation,” Journal of 
Educational Psychology, 2^ 620-33, November, 1933. (In formulae (G) and 
(H) n should appear as Vn ) 

Monroe, W S , and Engelhart, M D “A Critical Summary of Research 
Relating to the Teaching of Arithmetic,” University of Illinois Bulletin, Vol 29, 
No 5, Bureau of Educational Research Bulletin, No. 58 Urbana University 
of Illinois, 1931, pp 100-07 Contains a description of the method proposed by 
Lindquist and Wilks A standard error obtained by this method need not be 
regarded as a limit, as stated on page 105, if generalization is restricted to groups 
of the same distribution of initial measures 

Walker, H M “Concerning the Standard Error of a Difference,” Journal of 
Educational Psychology, 20 53-60, January, 1929 Use of the long formula for 
the difference between the mean gains of two equated groups is equivalent to 
use of “Student’s” method See both references to Ezekiel in this connection. 
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The second criticism is that the use of a probable error for- 
mula does not include any consideration of the effects of failure 
to control completely non-experimental factors or data faults 
except variable errors of measurement ^ This is not a criticism 
of the procedure but rather of the practice of accepting proof 
of statistical significance as proof of the dependability of an 
experimental difference. The contribution to the obtained dif- 
ference from failure to control completely non-experimental 
factors and from data faults, especially systematic errors of 
validity, is frequently relatively large and hence requires con- 
sideration Since the assumption that the population of an 
experiment may be considered a random sample is probably 
seldom justifiable and since a demonstration of the statistical 
significance of the experimental difference is not proof of its 
dependability, experimenters should endeavor to establish the 
dependability of their findings as generalizations by other 
means. Proof of the statistical significance of an experimental 
difference is likely to be of minor importance. 

The accomplishments of experimental research* Since con- 
trolled experimentation is generally recognized as a fruitful 
means of contributing to a science of education,^ the accomphsh- 
ments of this t3rpe of educational research will be commented 
on briefly. If experimental research in education is examined, 
a number of studies witlh dependable findings will be found. 
For example, the Judd-Buswell studies in the field of reading ^ 
have added to our knowledge of the effect upon the mental 
processes of a reader when instructions or materials are varied 
Studies relating to methods of teaching have added materially 
to our understanding of the processes of teaching and learmng. 
The thesis that variations in teacher zeal and skill may be more 

^ It has been shown that the probable error formula for the eJffect of samphng 
includes also the effect of variable errors of measurement 

For proof see Huffaker, C L , and Douglass, H R “On the Standard Errors 
of the Mean Due to Sampling and to Measurement,” Journal of Educational 
Psychology, 19 643-49, December, 1928 

* The progress toward a science of education is the topic of the concluding 
chapter of this volume 

^ One group of these studies was described briefly in Chapter I. 
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important than variations in instructional procedures may be 
cited as a major contribution. The total list of accomplish- 
ments of experimental studies, mcludmg the by-products, 
would doubtless be a long one and considering that we probably 
are only now emerging from the pioneer stages of this type of 
research we are justified in pointing to our accomplishments 
with considerable pride. 

On the other hand, the relative number of dependable ex- 
perimental studies IS distressingly small. For example, in re- 
porting a summary of the research relating to the methods of 
teaching mathematics at the secondary level, Douglass ^ states 
that the majority of more than two hundred studies examined 
are not worthy of mention. His summary includes only thirty- 
nine Although it is likely that a number of the studies exam- 
ined by Douglass were not controlled experiments, his state- 
ment is indicative of the quahty of experimental research in 
this field. In the same number of the Remew of Educational 
Research, Grmstead summarizes only eight experimental studies 
in the field of Latin and states that this list ^Tncludes all studies, 
known to be available, which make any sigmficant contnbu- 
tion, however small, to Latin classroom method.” ^ Summaries 
of experimental studies relating to a particular problem such 
as homogeneous grouping ^ reveal inconsistencies in the several 
findings and usually the reviewer states that few, if any, con- 
clusions may be regarded as definitely established.^ 

Sometimes an experimenter announces a conclusion so obvious 

^ Douglass, HU** Special Methods on the High School Level — Mathematics,’* 
Remew of Educational Research, 2 7, February, 1932 

2 Grmstead, W J “Special Methods on High School Level — Latin,” Remew 
of Educational Research, 2 56, February, 1932 

3 For example, see Rankin, P T “School Organization — Pupil Classification 
and Grouping,” Review of Educational Research, 1: 215-29, June, 1931 

In a more recent review of the research relating to homogeneous grouping, 
Douglass states “While homogeneous grouping has been frequently found 
more effective than heterogeneous grouping, the question has not been removed 
from that of unsolved problems, and the difficulties of adequate experimentation 
discourage hopes for any immediate definite answei ” 

Douglass, H R “Certain Aspects of the Problem of Where We Stand with 
Reference to the Practicability of Grouping,” Journal of Educational Research, 
26 344-53, January, 1933. 
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that one wonders why the inquiry was ever attempted. In con- 
cluding, a summary of research on the effect of “special methods 
in techniques on comprehension,” Gray ^ states “the significant 
fact about all these studies is that comprehension usually in- 
creased when specific traimng to that end was provided ” This 
conclusion appears to follow logically if we accept the thesis 
that children are teachable. 

A detailed examination of the procedure of experimentation, 
such as that attempted in the preceding pages of this chapter, 
reveals a number of crucial diflSiculties and suggests that they 
cannot be overcome, at least in the case of many problems. 
In a recent article Brownell ^ points out six faults that are 
frequently found m experimental studies. He also points out 
that experimentation in the field of education is not a simple 
undertaking and that many persons have attempted studies for 
which they were not properly equipped. 

The total picture is thus one that should challenge research 
workers Although the accomphshments are by no means 
negligible, they are certainly small m comparison with the total 
number of experimental studies. The time and money invested 
have yielded small returns. By way of explanation it may be 
pointed out that experimental inquiry m the field of education 
had scarcely begun by 1910 and that most of the work has been 
done since 1920. This explanation, however, is not adequate. 
When considered in the abstract, controlled experimentation 
seems to promise much. It is the procedure by means of which 
much has been accomphshed m the field of the physical sciences. 
It IS easily understood m its general outhnes. However, when 
its application to the field of education is considered in detail, 
it becomes an extremely complex method of research The 
cause of educational research, especially experimental research, 
has suffered from the enthusiasm of its friends. Shortly after 
1920 there was a concerted effort among certain leaders to 

^Gray, W S “Special Methods m the Elementary School — Reading,” 
Review of Educational Research, 1. 253, October, 1931 

2 Brownell, W A “Some Neglected Safeguards in Controlled Group Ex- 
perimentation,” Journal of EducationaZ Research, 27 98-107, October, 1933 
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stimulate quantity production of educational research and 
teachers and other school people who had only very limited 
traimng were encouraged to undertake experimental studies. 
Under such conditions it was inevitable that there would be a 
large number of studies possessing httle or no merit. 

The outlook for experimental research. Now that attention 
is being directed to the techniques of experimentation and the 
crucial difficulties are being pointed out, it seems reasonable to 
expect a material improvement m the quality of this type of 
research. In the first place there should be fewer ill-advised 
undertakings The nature and scope of the dependent variable 
should be specified in defining the problem. In the past this 
has often received little or no attention. The problem has been 
thought of as being to determine the effect of a specified change 
in the experimental factor without attempting to specify the 
nature of the effect to be measured For example, in the study 
of the effect of changing the grouping of pupils for instructional 
purposes from a heterogeneous plan to a homogeneous plan, 
practically no attention has been given to the nature of the 
pupil achievement to be measured Is the evaluation of the 
two procedures to be based on the ground covered, extent of 
the skills and memorized information acqmred, abihty to re- 
spond to thought questions, pupil attitudes and interests, or 
some combination of these outcomes of learning^ Obviously 
the problem is not adequately defined until the nature of the 
dependent variable has been specified. 

We may also expect more attention to be given to the sim- 
ilarity of the function of the two procedures being studied. 
Obviously, procedures that have dissimilar functions are not 
suitable for experimental study. It is easy to recognize this con- 
dition when the functions are grossly dissimilar as m the case of 
a method of instruction in arithmetic designed to engender calcu- 
lation skills and a second method designed to engender problem 
solving abihty. It is, however, not always easy to identify 
dissimilarities of function For example, is the function of a 
small class the same as that of a large class*? Not necessarily, 
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especially if the experimental factor is considered to include 
instructional piocedures that are adapted to the conditions 
created by the number of pupils to be instructed The primary 
function of a large class may be to contribute to the engendering 
of information while that of a small class may be to contribute 
to the engendering of the ability to discuss problems and issues 
and to respond to other types of thought questions 

Specification of the nature of the dependent variable to be 
measured and consideration of the similarity of the functions 
of the procedures being compared will greatly reduce the num- 
ber of ill-advised experimental studies Adequate defimtion of 
the problem should also reduce the number of attempts to prove 
the obvious. There is also opportunity to increase the quality 
of the research by improving the experimental procedure at 
certain points. In many cases the duration of the experiment 
should be extended More attention should be given to the 
control of non-experimental factors An effort should be made 
to secure more valid measures of the dependent variable 
Finally, the quality of experimental research may be increased 
by more intelligent interpretation of the findings. Determina- 
tions of the statistical sigmficance of the experimental dif- 
ference should be replaced by more appropriate considerations 
of dependability. When at all feasible, the findings should be 
subjected to the test of experimental verification. The experi- 
menter should seek the explanation of his findings. Frequently, 
an explanation of why the results were obtained is more sig- 
nificant than the actual findings 

ILLUSTRATIVE EXPERIMENTAL STUDIES 

The following references are not given as models of experimental proce- 
dure but rather as additional illustrations of techniques that have been 
employed A number of the studies are laboratory experiments. In several 
cases, the dependent variable is not pupil achievement. A few studies of 
transfer of training have been mcluded 

Bare, A S , and Park, J S. “An Experimental Study of Functional 
Learning,^' Journal of Experimental Education, 1 9-17, September, 
1932 
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Two methods of studying were employed by groups of graduate students. 
In the first, or du*ect, method, the students were instructed to memorize 
the symbols of two artificial alphabets. In the second, or mcidentai method, 
the students were told to concentrate on the translation of meanmgful 
material A rotation technique was used in which efforts were made to 
measure and allow for practice effect Objective tests, defused by the 
expel imenters, were used to measure immediate and delayed retention. 
In addition to standard treatment of the data, learning curves were con- 
structed. Additional data are reported with respect to the effects of atti- 
tude, effort, and fatigue The experimenters are to be commended for the 
following statement* “The results are true only for the subjects, methods, 
materials, conditions, and the learning measured m this experiment ” 
Generalization to ordmary school learmng would, of course, be unwar- 
ranted 

Bergman, W G., and Vreeland, Wendell “Comparative Achievement 
in Word Recognition under Two Methods of Teaching Beginning 
Reading,^' El&tnentary School Journal^ 32 605-16, April, 1932 
The “visual method” and the “picture-story method” of teaching 
beginmng reading, compared in this experiment, are described in detail. 
Three pairs of schools weie matched with respect to nationality and socio- 
economic status of pupil populations and school organization. The teachers 
participating m the expeiiment were matched with respect to principals* 
ratings and “carefully devised regulations were put in force covering the 
length of class period, outside practice m reading by pupils, and the prepara- 
tion of supplementary instructional materials ** The pupils w*ere tested 
four times dm mg the experiment Vocabulary analyses are presented of the 
contrasted reading materials and an analysis is given of the “relative fair- 
ness** of the final forms of the tests with respect to the compared reading 
methods. 

Brown, A E “The Effectiveness of Large Classes at the College Level: 
An Experimental Study Involving the Size Variable and the Size- 
Procedure Variable,** University of Iowa Studies in EducatioUf Voi. 7, 
No 3 Iowa City, Iowa: University of Iowa, 1932. 66 pp 
This experimenter sought “to measure the value of a set of procedures 
believed to be smtable to a large group, by comparing the achievement of a 
large group using these procedures with that of a small class taught by the 
instructor’s usual type of mstructional procedure ” The procedures are 
described m detail. Lindquist and Wilks* formula was used in calculating 
the standard errors reported in the study. Included m the report are 
“mterpretations based on observational evidence” and student opinion on 
the new procedures and on large classes as obtained by means of a ques- 
tionnaire 
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Buswell, G T , and John, Lenoee. Diagnostic Studies in Antkmetic,’’ 
Supplementary Educational Monographs ^ No 30. Chicago University 
of Chicago Press, 1926 212 pp. 

Two laboratory studies and two single group experiments under school 
conditions are reported in this monograph In the first laboratory study, 
eye movements in column addition weie studied In the second, time 
analyses were made of the four fundamental operations. 

Cattell, Psyche ^‘Constant Changes in the Stanford-Bmet IQ,” Journal 
of Educational Psychology, 22 544r-60, October, 1931 
In experimentation of which this study is typical, the dependent variable 
is the IQ and the independent variable, or ‘‘experimental factor” is time. 

Charters, W W Motion Pictures and Youth, A Summary New York 
The Macmillan Company, 1933 66 pp. 

In this reference is summarized a series of coordinated studies which 
fall “into two groups one, to measure the effect of motion pictures as such 
upon children and youth, the other, to study current motion-picture con- 
tent and children’s attendance at commercial movie-theaters to see what 
they come in contact with when they attend them ” Studies of the first 
group classify as experiments m which the experimental factor is “current 
commercial motion pictures” and the dependent variables studied aie 
“information, attitudes, emotions, health, and conduct ” Some of the 
experimentation was of the laboratory type 

Clark, Mildred, and Worcester, DA “A Comparison of the Eesuits 
Obtained from the Teaching of Shorthand by the Word Unit Method 
and the Sentence Unit Method,” Journal of Educational Psychology, 
23 122-31, February, 1932. 

One hundred and nine pupils in six high schools were taught by the 
sentence unit method and 83 pupils in five high schools were taught by the 
word unit method Comparisons were made between the achievements of 
the two groups on several tests and also between the achievements of two 
groups of 44 pupils each paired from the entire group on the basis of in- 
telligence test scores and chronological age 

Codlings, Ellsworth. An Experiment with a Project Curriculum. New 
York The MacmiUan Company, 1923 346 pp 
The experimental factor m this investigation was essentially that of 
variation m the type of learmng exercise. The experimental group of 41 ru- 
ral pupils proposed the learning exercises themselves, while the control 
group of 60 rural pupils had the traditional type of learning exercises as- 
signed to them At the end of four years, achievement was measured by a 
number of standardized tests Data are presented with respect to outcomes 
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other than those measured by the tests. The conclusions are favorable to 
the project type of learning exercise, but the experimenter may be criticized 
for failure to control important non-expenmentai factors mcludmg the zeal 
of the teachers. 

Douglass, H E. *'The Experimental Comparison of the Relative Effec- 
tiveness of Two Sequences m Supervised Study,” Umvemty of Oregon 
Publications, VoL 1, No 4 Eugene; Umversity of Oregon, 1927, 
pp 173-218. For a briefer account see Douglass, H. R. “An Experi- 
mental Investigation of the Relative Effectiveness of Two Plans of 
Supervised Study,” Journal of Educational Research, 18 239-45, 
October, 1928 

The problem was to determine the relative effectiveness of the study- 
recite sequence m supervised study as compared with the recite-study 
sequence Ten pairs of groups averaging 14 pupils each, vere selected and 
carefully equated on the basis of age and a composite of regressed, or 
estimated true, intelligence test scores and achievement test scores To 
equate teacher factors and those of room environment, teachers and rooms 
were exchanged at the mid-pomt of the experiment At the end of eleven 
weeks, the final achievement tests were administered. In interpreting the 
differences m mean gams, the long formula for the standard error of a 
difference was used Douglass also presents information with respect to the 
teachers’ attitudes toward the expenmental factor and the relation of the 
experimental factor to variabihty in achievement 

Dynes, J J “Comparison of Two Methods of Studying History,” Journal 
of Expenmental Education, 1 42-45, September, 1932 

In studying social science materials, one group read and reread the 
material while the pupils of the other group gave the material a rapid read- 
ing, reread the material, underlinmg essential parts, and takmg notes; and 
recalled what was read A rotation technique was employed with 144 pupils 
in two high schools Each set of study material was used half of the tune 
with one method and half of the time with the other A test was given 
before and after study of the material, and retention was tested three weeks 
later, a very commendable procedure The foUowmg statements reveal a 
somewhat unique method of handling data . of the 144 pupils who 
participated in the final experiments, 67 learned more wnth Method X, 73 
with Method Y, and 4 pupils made equal gams wnth both methods For re- 
tention, Method X proved to be the better for 49 pupils and Method Y was 
better for 76, w^hile 9 pupils retained equal amounts with either method ” 
This method of handling data seems more meamngful than the “statistically 
significant” differences reported 
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Eaton, M T ‘^The Effect of Praise, Eeproof, and Exercise upon Muscular 
Steadiness, Journal of Experimental Education, 2 44-59, September, 
1933 

The apparatus used to measuie the effect of praise, reproof, and exercise 
upon muscular steadiness was a plate-and-stylus tester of the Whipple type 
The subject held the stylus horizontally at arm’s length and placed it m the 
hole m the vertical contact plate The operator “pressed the key that con- 
nected the stylus and contact plate with the electric counter and simulta- 
neously started the stop watch . . only the net number of contacts m the 

specified ten second testing period w'as recorded ” The subjects were tested 
several times m different types of stimulus situations. The responses are 
analyzed in detail 

Horton, E E “Measurable Outcomes of Individual Laboratory Work 
in High School Chemistry,’' Teachers College, Columbia University 
Contributions to Education, No 303. New York Bureau of Publica- 
tions, Teachers College, Columbia University, 1928 105 pp. 

Several experimental investigations of the individual laboiatory, lecture 
demonstration, and problem methods in high school chemistiy are re- 
ported m this monogiaph After preliminary expeiimentation, this experi- 
menter set up nine groups, vaiying m size fiom 26 to 128 pupils, approxi- 
mately equivalent with lespect to means and standard deviations on the 
mid-term examination m chemistry Horton’s experimentation is com- 
mendable in seveial respects comparatively large groups were used, 
experimental factois were precisely defined, the groups were piobably of 
adequate equivalence, precautions were taken to secure control of important 
non-experimental factors, and efforts weie made to measure a variety of 
outcomes 

Htjdelson, Earl. Class Size at the College Level Minneapolis. University 
of Mmnesota Press, 1928. 299 pp 

Data were collected with respect to the attitudes of students and teachers 
toward class size by means of questionnaires and information relative to 
the instructional techniques employed by teachers, reputed to be skillful 
with laige classes, was collected by observation Trends in class size were 
investigated and the costs of large and small classes compared The effects 
of class size, as measured by term marks, was investigated. An extensive 
program of experimentation was earned on which involved 6059 students 
in 104 classes taught by 21 instructors. 

Hurlock, E. B. “An Evaluation of Certain Incentives Used in School 
Work,” Journal of Educational Psychology, 16: 145-59, March, 1925 

Four groups of elementary school pupils were used After equating with 
respect to several factors, the members of the first group were praised in 
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the presence of their classmates respect to their achievement on the 
initial test, the membeis of the second group were reproved, the members oi 
the thud group heard this praise and reproof, and the fourth group was 
taught separately This piocedure was repeated. The following differ ences 
m achievement are lepoited praised-control, reproved-controi, ignored- 
contiol These differences are accompanied by their probable errors The 
data aie analyzed to show the relation of the effects of the motivating fac- 
tois to age, sex, initial abihty, and accuracy 

Judd, C H , and Buswell, G T Silent Reading: A Study of the Various 
Types,’’ Supplementary Educational Monographs, No 23 Chicago: 
Umveisity of Chicago Press, 1922 160 pp 

In this laboratory study, the effects on eye movements of changes in 
content of reading were studied These changes included changes in diffi- 
culty, changes m language, and changes m attitude or purpose. 

Knowlton, D C , and Tilton, J W. Motion Pictures m History Teaching 
New Haven Yale University Pi ess, 1929 182 pp. 

In this experiment, photoplays supplemented the instruction m American 
history of the experimental gioup The pupils of the control gioup were 
given supplementary pages containing the information presented m the 
photoplays but not m the regular text This expeiiment is commendable 
in that efforts weie made to measuie a variety of outcomes This is in- 
dicated in the following section titles “The Effect of Photoplays upon 
Retention” and “Compaiison of Expeiimentai and Control Groups as to 
Participation m Classroom Discussion, Expression of Interest, and Volun- 
tary Reading ” 

Leonard, J P “The Use of Practice Exercises in the Teaching of Capitah- 
zation and Punctuation,” Journal of Educational Research, 21 186-90, 
March, 1930 See also Leonard, J P “The Use of Practice Exercises 
in the Teaching of Capitalization and Punctuation,” Teachers College, 
Columbia University Contributions to Education, No 372 New York. 
Bureau of Publications, Teachers College, Columbia Umversity, 1930. 
78 pp 

Eighty-two eighth- and mnth-grade pupils were paired on the basis of 
composite scoies derived from the scores on several tests “In addition to 
the regular mimeogiaphed lesson sheets, the experimental group received 
special practice exercises m proofreadmg, error correction, and dictation ” 
Twenty-five minutes of each period were devoted to these. The control 
group follow’ed the more conventional method “such as picking out pimc- 
tuation marks from paragraphs and citing rules for their use, wTiting com- 
positions and reviews for books and plays, and formulating sentences to 
illustrate certain rules ” Several tests were used to measure achievement. 
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Three compositions and two letters written by the pupil were carefully 
scored for errors In interpreting the experimental differences, both the 
long and short formulae were used, but no allowance was made for selection 
resulting from pairmg (See page 309 ) 

Malleb, J. B. ^^Cooperation and Competition, an Experimental Study in 
Motivation, Teachers College, Columbia University Contributions to 
Education, No 384 New York Bureau of Publications, Teachers 
College, Columbia University, 1929 176 pp 

Mailer used 814 experimental and 724 control pupils m this investiga- 
tion of individual and group competition as motivatmg factors The 
experimental pupils, alternately stimulated by individual recognition and 
reward and by group or class recognition and reward, solved addition 
examples The investigator states in this connection. ''The motives of self 
and class were alternated six times, respectively. The problem of practice 
effect was thus practically ehmmated All conditions of work aside from the 
motives were identical.^^ The difference in favor of mdividual competition 
was almost thirteen times its probable error There is little reason to doubt 
the "statistical’^ significance of this difference The experimental condi- 
tions, however, may be characterized as abnormal. The effectiveness of 
competition would lessen with continued use 

Nelson, M. J "The Differences m the Achievement of Elementary School 
Pupils before and after the Summer Vacation,” University of Wiscon^- 
sin, Bureau of Educational Research Bulletin, No. 10. Madison Univer- 
sity of Wisconsin, 1929 48 pp. 

The experimental factor and the dependent variable are evident in the 
title of this study An additional factor which was studied m the case of 
arithmetic and spelling was the "time elapsed before pupils return to the 
Spring level of achievement ” 

OiiANDBB, H T. "Transfer of Learning in Simple Addition and Subtrac- 
tion,” Elementary School Journal, 31 358-^9, 427-37, January, 
February, 1931- 

This investigator used 300 pairs of second-grade pupils equivalent with 
respect to growth m arithmetic ability over a period of five weeks. For 
twelve weeks the pupils of the experimental group were given instructions in 
generalizmg for three minutes of the daily twenty-minute period Achieve- 
ment was tested several times during the experiment The criticism seems 
justifiable that the experimental factor, generalizing mstruction was not 
applied for a sufficient time or intensively enough to add materially to 
the generahzing abilities acquired by the pupils on their own account The 
experiment is umque m the techmque used to secure equivalence of groups. 
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ReedeRj eh ^ a Method of Directing Children’s Study of Geography/’ 
Teachers Colkge, Columbia University Contnhuiions to Education, 
No 193 New \ ork* Bureau of Publications, Teachers College, Colum- 
bia University, 1925. 98 pp 

In this rotation experiment in seventh-grade geography the study of the 
pupils during experimental learning differed from that of the pupils durmg 
control learmng m that the former mvolved the use of mimeographed sheets 
of study questions with each assignment. ^While the rotation technique 
may be criticized with respect to carry-over of study habits from experi- 
mental learning to control learnmg, this limitation is one which operates to 
reduce differences in achievement. Reeder’s jBindings are significantly 
favorable to the directive procedure which constituted the experimental 
factor m spite of this limitation. 

Rxjlon, P J. The Sound Motion Picture in Science Teaching. Cambridge: 
Harvard Umversity Press, 1933. 236 pp 

A study was made of the socio-economic levels of the communities in 
which the schools participating in this experiment were located Other 
factors studied include the geographical distiibution of the pupils, the 
organization of their schools, the types of enrollment m general science, the 
compaiative teaching loads and proficiency of the teachers. The pupils 
were equated on the basis of mtelligence test scores, achievement test 
scores, and chronological ages Eqmvalence w’as checked with respect to 
geographical location, occupational-status” scores, pupil-hour load of the 
teacher, size of class, and sex The experimenter aided m the production of 
the sound films and text used m an effort to have them supplementary to 
each other and prepared the achievement tests used 

Starch, Daniel, and Elliott, E C, ‘‘Rehabihty of the Grading of High- 
School Work m Enghsh,” School Review, 20* 442-57, September, 1912. 

This weU known pioneer study may be regarded as an experiment m 
which the “personal equation” is the experimental factor and the mark 
assigned to an examination paper the dependent variable Facsimiles of 
two examination papers in Enghsh were rated independently by a number 
of teachers In general, a change m the person ratmg the paper resulted m a 
change m the grade assigned 

Wool, B. D., and Freeman, F. N An Experimental Study of the Educa- 
tional Influences of the Typewriter in the Elementary School Classroom, 
New York : The Macmillan Company, 1932 214 pp. 

Several thousand pupils participated m this experiment The experimen- 
tal and control teachers are shown to be approximately equivalent with 
respect to several traits or characteristics The experimental pupils and the 
control pupils are shown to be equivalent with respect to CA, MA, IQ, and 
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initial achievement A number of standardized tests weie used in measuring 
achievement The lepoit contains mteiesting graphic presentations of data. 
The foliov mg titles indicate the scope of the reseai ch Compaiative Gams 
in General Educational Achievement,” ^'Compaiative Gams m Individual 
Subject Mattel s,” “Writing Done by Experimental and Control Children,” 
“The Typewntei and the Gencial Aims of Education,” “The Typewriter 
and the PupiFs School Inteiests,” and “The Typewuiter and Reading ” 



CHAPTER X 

STUDYING PROBLEMS OF PREDICTION 

The problems of prediction. ^ Intelligence test scores and 
other prognostic measures are used as a basis for predicting 
school marks and future status in other lines of endeavor. Pre- 
dictions from such variables may be made in several ways. A 
person may make estimates vuthout conforming to any definite 
procedure. For example, in predicting success in college, a high 
school principal might study a student's record including in- 
telligence test score, chronological age, occupation of father, 
participation in actnuties, and the hke, giving to each item the 
weight he considers appropriate in the particular case. Predic- 
tions made in this way may be highly accurate, but the proce- 
dure is not systematic Usually predictions are thought of as 
bemg made by means of some systematic procedure such as is 
represented by a formula 

The two general problems are those of seeking out the best 
prognostic measures for a given situation and deriving the most 
effective systematic plan or formula for making the desired 
predictions or estimates from them. In both cases, the purpose 
is to reduce the error of prediction or estimate to a minimum. 
Hence, the development of techniques for determimng the 
magnitude of errors of estimate creates a subordinate problem 
of major importance 

A. Methods of Making Predictions 

Types of prediction formulae. The simplest systematic plan 
for makmg predictions objectively from a single independent 

1 The study of time series is not included m this chapter Brief reference to 
the general procedure of forecasting from such data was made on pages 235 and 
238-39 The reader who is interested m prediction from historical statistics will 
find treatments in a number of texts A good reference is Chaddock, R E Prina- 
pies and Methods of Statistics, Boston Houghton Mifflin Company, 1925 Chapter 
XIII. 
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variable is to transform the prognostic measures to the scale of 
the dependent variably and to use the transformed measures 
as the predictions. If Xo designates the transformed measures 
or the predictions, the formula is derived as follows; 

Jo - Mo ^ Xi - Ml 

(To 0*1 

Jo = - (Xi - Jf l) + Jfo 
0*1 

If the predictions are to be made from two or more prognostic 
measures (independent variables) they may be combined into a 
weighted sum. This derived variable is then used as Xi in the 
above formula. 

If Xo is taken as the criterion, the error of prediction or esti- 
mate IS represented by Xo — Xo. When the above procedure is 
employed, the magnitude of the errors of estimate for a typical 
population will be somewhat larger than when the regression 
equation is used as the formula of prediction. If the relation- 
ship between the two variables Xo and Xi is linear, the regres- 
sion equation ^ is 

Jo = roi — Ji + Mo - rox-Mi 
CTl CTi 

If the predictions are to be made from two or more inde- 
pendent variables, Xi, X 2 , X 3 , etc., the multiple regression 
equation is 

Xo = 601 23 . n-Yl + &02 134 .. . 71-^2 

+ . . + fcon 123 . . (« - 1) + C 

As in the case of predictions made from a single independent 
variable, the constants in the nght-hand member of the equa- 
tion are to be defined so that 2)(Xo — Xo)^ will be a minimum. 

^The basis of the derivation of the regression equation is the requirement 
that X(Xo — Xo)^ be a minimum For the derivation, see Holzmger, K J. 
Statistical Methods for Students in Education, Boston Ginn and Company, 
1928, p. 159. 
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This condition is satisfied ^ if 


b 


01 23 ... n — 2^01 234 . . 


t^O.234 . . n 

n 

<2'1-234 . . . n 


5 


02 134 ... 71 — 2^02 134 , . 


CTO 134 n 

n 

<2‘2.134. n 


7. — ^ CfO 123 . (rt - 1) 

Oon 123 . . (n — 1) — ^On 123 . . . (ti ~ 1) 

123 . . (n ~ 1) 

0 = ■M'o &01 234 .. . n-M^l — 6 o 2 134 . . nAf2 

&071 123 . . . (n - l)Mn 

If the variables are expressed m terms of deviation measures, 
the constant term is zero and we have 

XO = &01«23 . . . n^l + io2 134 7i^2 + • . + ^On 123 . . . (n - l)Xn 

If the variables are expressed m terms of standard deviation 
measures, /3 (beta) is used as the designation of the regression 
coefficients 

So = 001 23 . . TiiS^l + 002 134 . . 712^2 + •.. + /5on.l23 . (ti - l)^n 

This is referred to as the standard score regression equation. The 
relation between the two types of regression coefficients is 

^>01-23 ... 71 = 001 23 ... 71 

(Tl 

Hence the general form of the regression equation may be 
written 

Xo = 0Ol2Z . . 71 -X^i + ^02 134 .. 7i“X2 

CTi (72 

+ . . . + 0On’123 . . (ti - 1) “T Xn + C 

When the relationship between the two variables is not linear, 
an equation of prediction may be derived by the methods of 

^ For tlie proof of this statement see Kelley, T L Statistical Method New 
York The Macmillan Company, 1923, pp 283 f 
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curve-fitting.^ This equation may be any one of a number of 
types such as 

Xo = AXl + C 
Xo- AlogZi + C 

AXl + C 

Toops has proposed a “generahzed regressions^ equation for 
which he claims superiority as a formula of prediction. The 
general form of Toops’ equation ^ is 

Xo-[l+/l(Xi)] I1+/2(X2)] [1+/3(X3)] .. 

The deternunation of the constants in the selected type of 
equation may be accomphshed by the method of averages which 
imposes the requirement that the algebraic sum of the differ- 
ences, Xo Xo, or residuals equals or approaches zero, or by 
the method of least squares which imposes the requirement that 
the sum of the squares of the residuals be a naimmum. Some- 
times, more complex methods are used. 

It should be noted that a prediction formula is derived from 
measures of one population and then applied to another. In 
other words, the population, from which the data for deriving 
the equation were obtained, is considered a representative 
sample of a larger population. Obviously, the accuracy of the 
predictions will be conditioned by the representativeness of 
the sample Frequently, the requirement of representativeness 
can be only approximated. For example, in deriving a formula 
for predicting the college success of high school graduates, one 
is limited in securing data to high school graduates who enter 
college and remain long enough to secure marks. Since this 
group is somewhat selected, a sample of college freshmen will 

^ See Holzinger, op. c%t., pp 317 f 

Ezekiel, Mordecai Methods of Correlation Analysis New York John Wiley 
and Sons, Inc , 1930, Chapter VI. 

^ Toops, H A “Empirical Psychology and the ‘Generalized Regression^ 
Equation,” Ohio College Associotwn Bulletin, No 81 Columbus, Ohio Ohio 
State University, 1929, p 1005 
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not be entirely representative of the population of high school 
graduates 

Prediction by graphical methods. When predictions are to he 
made from a single independent variable, a graphical procedure 
may be employed. One advantage of this method is that it is 
not necessary to assume a Imear relationship between the 
variables. The data are tabulated in a correlation table with 
the scale of the independent variable in the horizontal position. 
Then the mean of each column is calculated. These means are 
plotted as ordinates and the corresponding mid-points of the 
intervals of the horizontal scale, as abscissas. Through the 
points thus located, the best fitting curve is drawn. In draw- 
ing the curve, it should be recognized that the means at the 
extremes are based upon few cases and hence probably are not 
highly reliable. Hence, the curve may not fit the extreme points 
as closely as those for means based upon the larger number of 
cases. 

In Figure 4 the means of the columns are given at the bottom 
and the line labeled A connects the points located by using them 
as ordinates. Since the relationship is apparently linear, the re- 
gression line labeled B in the figure probably represents the best 
fitting curve. Except at the extremes where the means are not 
dependable because the number of cases is small, prediction 
from line A will be approximately the same as those made from 
the regression equation ^ represented by line R. For example, a 
child whose Otis Score (Xi) is 35 would be likely to secure a Few 
Stanford Reading Score of 76 or 77 (Xo). The value obtained 
from the regression equation is 76 38. 

Prediction from the curve of relation versus prediction from 
the regression equation. The use of a regression equation as a 
formula of prediction gives the procedure the appearance of a 
high degree of accuracy. This is unfortunate. Even when the 
regression equation has been derived from a sample representa- 
tive of the population within which predictions are to be made, 
the error of estimate, Xo — Xo, will approach zero in only a 

^ The regression equation is Xo = 98Xi 4* 42 08 See page 324. 
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scores on the Otis Self-Admmistering Test of Mental Ability. 
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few cases unless roi is very near to 1.00. In a typical prediction 
situation a number of the errors of estimate will be relatively 
large. This fact is apparent when the regression Ime is drawn 
on a correlation chart, because the error of prediction will be 
zero m the case of only the pairs of data which form coordinates 
of points on the hne Since the regression hne passes through 
relatively few points, it follows that m general the predictions 
mvolve errors. The obviousness of the eirors of estimate may 
be cited as an advantage of the graphical method, especially 
when predictions are being made by persons not familiar with 
regression equations and errors of estimate Another advantage 
IS that hnearity of regression is not a requirement. A hne can 
be drawn to fit any curvihnear relationship that is apparent. 
A third advantage is that aftei the hne of relationship has been 
drawn, predictions may be made more qmckly than wRen 
values are substituted m an equation ^ Hence, when only one 
independent variable is involved, the graphical method is to be 
preferred as a practical procedure ® 

The graphical method is also apphcable to time series such 
as enrollment statistics over a period of years As a means 
of makmg the general trend more apparent, the distribution 
may be smoothed by the method of moving averages. The 
simplest moving average is formed by taking the mean of the 
data for three successive years as the “smoothed” enrollment 
for the middle year of the sequence. The graphical representa- 

^Both Hull and Segel have described a method for obtaining predictions 
automatically from electric tabulating and accounting machines Such a method 
results m a great saving of time when a large number of predictions are to be 
made from a multiple regression equation 

Hull, C» L “An Automatic Machine for Making Multiple Aptitude Fore- 
casts,” Journal of Educatiorial Psychology, 16 593-98, December, 1925. 

Segel, David “The Automatic Prediction of Scholastic Success by Using the 
Multiple Regression Equation Techmque with Electric Tabulating and Ac- 
counting Machines,” Journal of Educaltonal Psychology, 22 139-44, February, 
1931 

2 Griffin has shown how predictions from a multiple regression equation may 
be accomplished by graphical methods 
Griffin, Harold D “Constructing a Prediction Chart (Charting Linear 
Regression Equations),” Journal of Applied Psychology, 16:406-12, August, 
1932. 
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tion of the smoothed” enrollments affords a means of fore- 
casting future enrollments ^ 

Technique of deriving the regression equation when there 
are two or more independent variables. For two independent 
variables, the regression equation may be written as follows: 


Zo = 

c = 


<ro(roi ~ ^02^12) ^ , cro(ro2 — ToiVu) ^ 

■ <rx(l -rf2) + ” + ^ 

cro(roi — ro2ri2) 7,^ <ro(ro2 — roiri2) 

rf,) 


ATo 


When written in this form, the plan of computation is apparent. 
It is only necessary to know Mo, Mi, M 2 , cro, 0 * 1 , 0 * 2 , roi, ri 2 , and 
ro2- 

The regression equation for any number of independent vari- 
ables may be derived from the general form given on pages 
324-25. The regression coefficients (6’s) consist of the product of 
a coefficient of partial correlation ^ and the ratio of two partial 
standard deviations 


r ^ cro.234 .. n 

Ool 234 . . . n = roi 234 . . . n 

<ri 234 . • . n 

A coefficient of partial correlation of any order can be ex- 
pressed m terms of coefficients of lower order. For example, a 
first order partial may be expressed as follows: 

ni — ro2ri2 

VI - rgaVl — rfa 

In the case of a partial of (n — l)th order forn + 1 variables ^ 
the relationship is 

— ^01*23 . (w- 1) ~ fOn 23 , (n - l)Tln 23 (n - 1) 

^01 23 . • n ““ /■( o 9 ' ' ' 

VI — Tqj^, 2 Z . . (n -- 1) VI — 23 . (n — 1) 

This general relationship makes it possible to build up a partial 

1 For further treatment of time senes, see Chaddock, op ait , pp 306 f 

2 See pp 377 f 

3 In this case the variables are xq, xu xn If the variables are xi, a72» 

xrs . . . xnt the order is(n--2) for n variables. 



STUDYING PROBLEilS OF PREDICTION 331 

correlation coefficient of any order from zero order coefficients. 
The general formula for obtaining the partial standard devia- 

tions IS 

0-0 123 ... n = 0-0 Vl-4Vl-?^jiVr-r?3i2 ■ - Vl-rg„.i 23 ...u-l) 

These general formulae for regression coefficients, coeffi- 
cients of partial correlation, and partial standard de\nations 
provide the basis for deter mi ni n g the regression equation for 
any number of variables in terms of their means, standard 
deviations, and mtercorrelations. It is apparent, however, that 
the arithmetical work indicated by these formulae will be very 
laborious when the number of variables is greater than three 
(two independent variables). Some economies may be effected 
by systematizing the procedure,^ but the maximum economy 
is attained by a method known as “partial regression.” ^ This 
method is based upon the foUoving relationships known as 
“normal equations”: * 

1 For example, Garrett gives m detail a plan of the calculations for three, four, 

and five variables , 

Garrett, H. E Statistics %n Psychology and Edrication, New York Longmans, 
Green and Company, 1926, pp 228 f 

2 The reader who wishes to make a study of this topic will find a brief historical 
account in Griffin, H D “Partial Correlation versus Partial Regression for 
Obtaining Multiple Regression Constants,” Journal of Educaiionol Psychology^ 
22 35-44, January, 1931 

This article gives references that may be consulted for further study Some 
more recent references are 

Bakst, A “A Modification of the Multiple Correlation and Regression Coef- 
ficients by the Tolley and Ezekiel Method,” Journal of Educational Psychology, 
22 629-35, November, 1931 

Horst, A “A General Method for Evaluating Multiple Regression Con- 
stants,” Journal of American Statistical Association, 27:270-78, September, 
1932 

Peters, C C , and Wykes, E C “ Simplified Methods for Computing Regres- 
sion Coefficients and Partial and Multiple Correlations,” J oumal of Educational 
Research, 23 383-93, May, 1931 

® Tolley, H R , and Ezelael, M J B “A Method of Handling Multiple Cor- 
relation Problems,” Journal of American Statistical Association, 18 993-1003, 
December, 1923. 

Garrett, H- E. “ A Modification of Tolley and EzekieFs Method of Handling 
Multiple Correlation Problems,” Journal of Educational Psychology, 19 45-49, 
January, 1928 

The equations derived by Tolley and Ezekiel are not in the form given here, 
but Garrett shows how they may be transformed by certain substitutions Note 
that his symbolism is different from that employed here 
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The Doolittle method of solving simultaneous equations was 
applied to these equations by Tolley and Ezekiel Griffin ^ has 
formulated an economical method for obtaimng the values of 
the beta coefficients ^ from these simultaneous equations. He 
gives in detail the necessary formulae for three, four, five, six, 
seven, and eight variable problems. 

Differential prediction. Many students attain a higher 
degree of success in one field of study than in another For 
example, a student taking courses in mathematics and in English 
may make a higher average mark in the former field than in the 
latter. The relative standing of a second student may be higher 
in English One of the problems confronting those who attempt 
to render educational or vocational guidance is that of predict- 
ing the field of endeavor m which success is most likely. A re- 
gression equation may be formed for the difference in average 
standing in two fields, Xj), as the dependent variable and a set 
of prognostic measures as independent variables. This equation 
may be used as the formula for predicting the difference in de- 
grees of success in the two fields as measured by the average 

H I) “Simplified Schema for Multiple Linear Correlation/’ 
Journal of Experimental Education^ 1 239-54, March, 1933 

2 These are the products of the 6’s and the ratios of standard deviations in 
the form of the normal equations given here 

The term “beta coefficient” is applied to the regression equation when each 
variable is expressed in terms of its own standard deviation as a unit When the 
variables are not so expressed, the obtained coefficients are equal to the product 
of the corresponding beta coefficients and the ratio of the standard deviation of 
the dependent variable to the standard deviation of the independent variable 
whose coefficient is being considered In terms of symbols 

fe«23.. n = ^01 23 and hoi 23 n = ^01 23 

<ro <ri 

The use of beta coefficients appears to have been suggested by Kelley. 
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marks received. _Such prediction has been called differential.^ 
The predictions Xd are subject to an error of estimate <td i. If 
it IS assumed that the differences form a normal distribution 
■mth the mean at zero, any predictions may be mterpreted in 
terms of the probabihty that the actual difference m average 
standing in the two fields will have the same sign as the pre- 
dicted difference.^ 

In order to secure measures of the dependent variable, differ- 
ence in average standing m the two fields, it is necessary that a 
representative population attempt simultaneously to achieve 
in both of the fields. This requirement creates a serious practi- 
cal difficulty, especially in the ease of fields of vocational ac- 
tivity. As a means of avoiding this difficulty, the same prog- 
nostic measures may be secured for a representative population 
in each of the two fields of endeavor. If the regression equations 
derived from these sets of data are expressed in standard score 
form, the predictions for a given mdividual will indicate the 
probable relative degree of success m the two fields.® 

It should be noted that our present prognostic measuring 
mstruments have been developed for ordmary prediction, and 
hence they may not be highly efficient for differential predic- 
tion. As the latter type of prediction is studied, it may be that 
superior differential prognostic instruments will be identified. 
The practical value of differential prediction appears to justify 
research directed to this end. 

B. Measures of the Accuracy of Predictions 

Standard error of estunate — one independent variable. The 
accuracy of predictions made by a given formula may be deter- 
mined by applymg it within a representative population, ob- 

1 Segel, David “Differential Prediction of Ability as Represented by College 
Subject Groups,” Journal of Educational Research, 25 14-26, 93~9S, January 
and February, 1932 

2 For the technique by means of which this may be accomphshed, see page 106. 

®For an illustration of this type of differential prediction, see Waits, J. V. 

“The Differential Predictive Value of the Psychological Examination of the 
American Council on Education,” Journal of Experimental Education, 1 264-71, 
March, 1933. 
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taming the criterion measures and computing the errors of 
estimate. For example, a formula for predicting scholastic suc- 
cess of high school graduates when they enter college may be 
applied to a representative population of college entrants At 
the end of the year the marks received may be secured and the 
errors of estimate calculated An average of these errors will 
be a measure of the accuracy of the predictions. 

This method is simple and reveals the total error of esti- 
mate due to all causes operating m the case of the population 
to which the application is made. It, however, requires time, 
usually a year or longer to secure the criterion measures For 
this reason, the formula is usually tried out with respect to a 
theoretical, normally distributed population for which the 
means and standard deviations are the same as these statistics 
for the population from which the data for deriving the formula 
were obtained. If the measured status is the ciiterion, the error 
of estimate is represented by Xo — Xo. If the true measure of 
the status is the criterion, the error of estimate is represented ^ 
by Xoo — Xo. Within a typical population, some of the errors 
of estimate will be positive, others will be negative If the as- 
sumption is made that for the population being considered the 
errors of estimate form a normal distribution with the mean at 
zero, the “average’’ magnitude may be measured by the stand- 
ard deviation of this distribution. This statistic is called the 
standard error of estimate. The probable error of estimate may 
be obtained by multiplying by the constant .6745. 

Since we have two types of formulae for making predictions 
when the relationship is linear and two cnteria with which they 
may be compared, there are four standard errors of estimate. 
The method of deriving the formulae for these standard errors 
of estimate may be illustrated by considering the case in which 
the predictions are made from the regression equation and Xo 
is taken as the criterion. It is assumed that Mo = which is 

iThe symbol <» designates “infinity” and designates the mean of an 
infinite number of measures of the same individual, none of which involves a 
systematic error 
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equivalent to saying that neither Zo nor Xi in the theoretical 
population involves a systematic error or that their systematic 
errors are equivalent. It is also assumed that the regression of 
Zo i^on IS hnear. Since i¥o = Mo, Zo - Zo = (Zo - Mo) 
— (Zo — Mo) ~ xo — xo, we may deal with deviations instead 
of raw measures The regression equation in terms of de\uation 
measures is; 


— wo 

= roi—xi 

ori 

Hence the error of estimate, xo — xo is equal to 


^0 

^0 ~ roi— Xi 
CTl 


The standard deviation of the distribution of the errors of 
estimate, or the standard error of estimate, ^ is represented by 
the symbol cto i 

2 _ Z(xo — xoy 
<roi- 

= is(ro-rox-";ri)= 

iV CTi 

~ N “Vi N “V? N 

(To 1 = CToVl ~ nil 


The expression Vi ~ r^, which appears in a number of equa- 
tions, has been called the coeffiaent of alienation ^ and is desig- 
nated by k Hence we may write cro i = aohi. 


^ The ordinary formula for the standard deviation is cr = 




where x 


represents the deviations from the mean In the equation given here, x is re- 
placed by the expression for the error of estimate and both sides are squared 
2 Kelley, T L “Principles Underlying the Classification of Men/’ Journal 
of Apphed Psychology y 3 50-67, March, 1919 
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The standard error of estimate (cro i) gives the magnitude 
of the enor that will be exceeded m approximately one-third 
of the predictions The probable error of estimate {PBo i 
= 6745a-oVl — rl^) is more easily interpreted because it gives 
the magnitude of the error that will be exceeded in one-half of 
the predictions. 

If the estimates are compared with true measures of the de- 
pendent variable, the error of estimate is Xco — Xo. The cor- 
responding standard error of estimate is given by the equation 

CTco 1 = CTco VI — rloi 

O' CO = o-oVroo (See page 151) 

Tool = (See page 148) 

Vr^jo 

Substituting these values in the above equation and simphfy- 
ing, we have 

o-«5 1 = o-oVroo — rgi 

We may also write 

PXoo 1 = 6745croVroo — 

If the predictions are made from the formula 

Xo = — (Zi - Ml) +Mo 

CTi 

the corresponding standard errors of estimate are given by the 
expressions o'oV2 — 2roi and o-oVl + roo -- 2roi. 

Thus, we have four standard errors of estimate, two for pre- 
dictions made from the linear regression equation and two for 
the use of transformed measures of the independent variable as 
predictions. Each is correct but the last two ^ are seldom used 
because we are usually concerned with predictions made from 
the hnear regression equation. 

Accuracy of prediction when there are two or more inde- 
pendent variables^ When the predictions of Xo are obtained 
from two or more independent variables by means of a multiple 

1 No symbols have been proposed for designating them 
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regression equation, the coefficient of correlation between these 
predictions (Zo) and Zo is given by the formula ^ 

J?0 123. ^Ol)(I ^02 l)(l qi3 12 ) • (I”"^0»12*. (w—d)] 

The equation for the standard eiror of estimate is 

CTO 123 . . n “ V 1 Rq 123 . . « 

= ( 7 oV(l roi)(l 5^02*1) (I ^On 12 . («”!)) 

We may also write 

O' CO 123 . . . n ~ CTo Vr^o Rq 123 * • rt 

when Zo is taken as the estimate of oo^o. 

If the transformed weighted sum of the mdependent variables 
is used as the prediction, the correspondmg standard errors of 
estimate would be obtained by replacing the coefficient of 
multiple correlation by the coefficient of correlation between Zo 
and the weighted sum. 

Procedures for interpreting a standard or probable error of 
estimate for a group of predictions. The fact that the standard 
error of estimate has been found to be a certain magnitude such 
as 3.7 or 18 3 is not very meamngful. One plan of interpretation 
IS to determine the per cent of errors of estimate that are not 
greater than a certam amount. This plan is useful when the 
predictions are made in terms of a small number of categories 
such as school marks. For example, the interpretation may be 
made in terms of the per cent of marks correctly predicted, 
the per cent of predictions in error by one step of the scale, and 
so on A second plan of interpretation is to compare the obtained 
standard error of estimate with the standard error of estimate 
for a cnterion prediction. Two such bases of comparison have 
been employed — (1) chance or random prediction and (2) the 
mean of the dependent variable taken as the prediction for aU 
members of the population. The result of these comparisons 
has been called efficiency of prediction.” In both cases the 

1 It should be noted that Rq 123 . w is merely rXoXo 
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interpretation is related to the coefficient of correlation between 
Zo and Xi This relationship makes possible the determination 
of the accuracy or efficiency of the predictions from merely the 
coefficient of correlation. 

Per cent of errors of estimate that are within a specified 
limit. When the predictions have been made from the regression 
equation and Xo is taken as the criterion, the errors of estimate 
for a typical population are assumed to form a normal distribu- 
tion whose standard deviation (standard error of estimate) is 
given by the formula 

(To.l = (ToVl — rgi 

The problem is to determine the per cent of this distribution 
that lies within a specified distance from the mean which is 
zero. This distance is commonly described in terms of the scale 
on which Xo and consequently the predictions are expressed. 
If (To is taken as a unit of measurement, the specified limits may 
be expressed as ^mo-Q, Solving the formula for the standard 
error of estimate, we have 


crp 1 

VI - rfi 


Multiplying both sides by m, we obtain 


mcTo 


m 

VI - rgi 


m 

<ro 1 == 7 — cTo 1 


The specified limit mo-o is thus equivalent to (Tq i The per 

^01 

cent of the predictions whose errors of estimate fall within 
±moro may be found by dividmg m by VI — r|i or iloi> locating 
this quotient as a deviation in a table ^ of the areas under the 
normal probability curve corresponding to deviations from the 
mean and multiplying the corresponding area by 2. The per 
cents for various values of m and 9*01 are given in Table XI. 

^ For example, Holzmger, K J Statistical Tables for Students in Education 
and Psychology, Chicago Umversity of Chicago Press, 1925 — Table XI For 
description of this table and its uses see pages 80-81 of this book 
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Table XI. Showing for Various Values op tox, the Per Cent of 
Predictions Whose Error Is Not Greater than the Amount 
Indicated 


roi 

Per Cent op Predictions Whose Error op Estim 
THAN the Amount Indicated 

\te Is Not Greater 

l«ro 

2(ro 

3<ro 

4<ro 

5 ao 

6cro 

Stro 

1 Oero 1 25(ro 1 Stro 

2 0<ra 2 5 ffo 

00 

08 

16 

24 

31 

38 

45 

58 

68 

79 

87 

95 

99 

10 

08 

16 

24 

31 

38 

45 

58 

68 

79 

87 

96 

99 

20 

08 

16 

24 

32 

39 

46 

59 

69 

80 

87 

96 

99 

30 

08 

17 

25 

33 

40 

47 

60 

71 

81 

88 

96 

99 

40 

09 

17 

26 

34 

42 

49 

62 

72 

83 

90 

97 

99 

50 I 

09 

18 

27 

36 

44 

51 

64 

75 

85 

92 

98 

100 

55 

10 

19 

28 

37 

45 

53 

66 

77 

87 

93 

98 

100 

60 

10 

20 

29 

38 

47 

55 

68 

79 

88 

94 

99 

100 

.65 

11 

21 

31 

40 

49 

57 

71 

81 

90 

95 

99 

100 

70 

11 

22 

33 

42 

52 

60 

74 

84 

92 

96 

99 

100 

75 

12 

24 

35 

45 

55 

64 

77 

87 

94 

98 
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100 

80 

13 

26 

38 

50 

59 

68 

82 

90 

96 

99 

100 

100 

85 

15 

30 

43 

55 

66 

75 

87 

94 

98 

100 

100 

100 

90 

18 

35 

51 

64 

75 

83 

93 

98 

100 

100 

100 

100 

95 

25 

48 

66 

80 

89 

95 

99 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 


The application of this technique may be illustrated by 
assuming a five-point marking system — ^A, C, D, and E — ^in 
which the distribution of the marks conforms to the normal prob- 
ability curve. If the total range is taken as fiu and each mark 
represents a range of l.OOo", the hmit of the error of estimate 
for marks predicted correctly would be .50a'. For those in er- 
ror by not more than one step, the limit would be l.SOo*. If the 
coefficient of correlation between the basis of prediction and the 
marks to be predicted is .60, ifcox = 80 Hence, the magmtude of 
errors for marks predicted correctly would be 


.50 

.80 


0*0 1 = .625cro.i 


When this distance is measured in both directions from the 
mean, 47 per cent of the area under the normal probability curve 
is marked off. Hence, when roi == .60 and the assumptions are 
satisfied, 47 per cent of the marks will be predicted correctly. 
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By a similar procedure it is found that 94 per cent will be 
predicted correctly or be m error by only one step. 

The values given in Table XI are for Zo as the criterion. 
The per cent of errors of estimate that do not exceed a specified 
magnitude may be determmed also for as the criterion The 
only change in the procedure is that m is divided by Vroo — 
instead of by Vl — If the rehability of the criterion is 75 
and roi = .60, 57 per cent of the predictions will be within .50cro 
of the true criterion. The per cents corresponding to various 
values of tqu and m cannot be conveniently given because a 
separate table would be required for each value of 

Efficiency of prediction when Zo is the criterion. The second 
method of interpreting the standard error of estimate is to 
compare it with the standard error of estimate for chance or 
random predictions, or for the mean of the dependent variable 
(Mo) taken as the prediction for all members of the population. 
When Mo is used as the prediction ^ the standard error of esti- 
mate is cTo Using this as a norm, the reduction or improvement 
m the standard error of the estimate when the predictions are 
made from the regression equation is given by the expression 
(To ” cToVl — roi The per cent of improvement may be found 
by dividing this by otq 

(To - croVl - Tl, ^ ^ 

<ro 

The expression ^ (1 Vl — r§i) has been called efficiency of 
predictions^ or “predictive index and E, Jp, and PI have been 
used as designations for it. As a means of avoiding confusion 
with the expression developed in the following paragraph, Em 
is suggested as a symbol. 

1 The use of Mo as the prediction for all members of the population corresponds 
to Toi = 0 This IS obvious from the regression equation 

Xo = Toi-^xi + Mo “ roi-~Mi 

<ri 0*1 _ 

If foi = 0, the two terms involving it become zero and Xq — Mo 

2 The factor, 100, is sometimes inserted as a multiplier to enable us to write the 
values obtained as per cents without a decimal point 
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In order to serve as a convenient criterion, chance or random 
predictions must have the same mean and the same standard 
deviation as Xo. For example, if letter grades of A, B, C, D, and 
E are being predicted, this requirement means that the number 
of A’s predicted is equal to the number of A’s actually received, 
the number of B’s predicted is equal to the number of B’s actu- 
ally received, and so on. Estimates of the future status of a 
group of students by a teacher who is acquainted with them null 
probably not be random predictions ^ but will be correlated 
with Xo. If the random predictions are represented by Xg, 
the error of estimate will be Xo — Xg 


0*0 £, = 




S(Zo - Xg) 

N 


2 


= Vo*| + 

Since o-g is assumed to be equal to (Tqj 


ao g = V2cr§ 

The reduction m the standard error of estimate is 


V2<7g — (ToVl *- 

The per cent of improvement is obtamed by dividing this ex- 
pression by 


V2org - oToVl - 4 ^ _ l l - rgi 

S 2 

1 Kaulfers, W V “A Guessing Experiment in Foreign Language Prognosis/’ 
School and Soaiety, 32 535-38, October 18, 1930 
In this study a number of teachers were instructed to guess the final grades of 
their students at the beginning of the third day of the semester The correlations 
of the “guesses” with the grades actually received range from 05 {N = 23) to 
,73 {N — 75), and eight of the seventeen coefficients are above 50 It is, of 
course, not known to what extent the estimates made influenced the final marks, 
but the resulting correlations indicate very clearly that the predictions made on 
the basis of only very limited acquaintance -with pupils are likely to be far from 
“pure” guesses In an unreported study by the senior author, estimates of final 
grades in chemistry were made by a graduate student, with no teaching ex- 
perience, on the basis of very limited casual observation of the students in the 
laboratory. The correlation of the estimates with final marks was .28 (iV ™ 94) 
When estimates were determined by arranging the names of the students in 
alphabetical order and assigning A’s to the first ones in the list, B’s to those 
next, and so on, the correlation between these estimates and the final marks 
was — 037 m one case and — 088 m a second. 
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I _ y.2 

The expression 1 — 'J — 2 ^ improvement 

over ^^pure guesses’' or chance predictions and may be desig- 
nated by Bg. 

If the predictions are made by means of the formula 

Jo = -"(Ji - Ml) +Mo 
CTl 

the standard error of estimate is ao ■V2 — 2roi, and the correspond- 
ing measures of the eflS.ciency of prediction are 1 — V2 — 2roi 
and 1 — VI — roi. These measures are new and Pm and Pg are 
suggested as symbols. 

Thus we have two formulae for the efficiency of prediction” 
when estimates are made from the linear regression equation, 
one giving the per cent of improvement over using the mean as 
the prediction for all members of the population and the other 
giving the per cent of improvement over ^'pure guesses ” We 
also have two corresponding measures of the efficiency of pre- 
diction when the estimates are made from 

Jo = ( Ji - Ml) 4- Mo 

CTl 

For roi = 60 these measures of efficiency of prediction are .20, 
.43, .11, and 37 All of these measures are correct but each 
must be properly interpreted. It is unfortunate that Bmj which 
is commonly used, has been interpreted as the “improvement 
over chance in prediction.” ^ The result has been a false im- 

1 For example see Holzinger, K J Statistical Methods for Students in Edu- 
cation Boston Ginn and Company, 1928, p 166 The symbol Jp is used instead 
of Em 

The only comment on this erroneous interpretation that has come to the 
attention of the present writers is in a recent article 

Douglass, H R Some Observations and Data on Certain Methods of Meas- 
uring the Predictive Significance of the Pearson Product-Moment Coefficient of 
Correlation,” Journal of Educational Psychology, 25 225-31, March, 1934 

A few persons have correctly understood the formula Em = 1 — Vl — rh 
For example, see Bailor, E M ^‘Content and Form in Tests of Intelligence,” 
Teachers College, Columbia University Contnhutions to Education, No 162 New 
York Bureau of Publications, Teachers College, Columbia University, 1924, 
p 25 

In view of the fact that Walker in Studies in the History of Statistical Method, 
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pression in regard to the predictive value of prognostic meas- 
ures Table XII gives both Em and Eg for various values of roi. 
These measures of predictive efficiency are related as follows: 

Eg = .7071Em + .2929 
Em = 1.4142Ej - 4142 

It is, of course, permissible to use any measure of predictive 
efficiency provided it is correctly interpreted, but the present 
wnters are of the opimon that our thinking relative to the 
predictive value of prognostic measures will be facihtated by 
employing 



Table XII Efficiency op Pbbdiction of Xo by Means op Regression 
Equation, for Various Values op roi ok Eo is ..n 


roi OR 
i2o 12 . . n 

Em* 

1 W 

roi OR 
.Bo 12 . rt 

Em 

E, 

00 

0 

I 29.3 

55 

\ 16 

41 

05 

01 

29 4 

60 

i 20 

43 

.10 

0 5 

29 6 

65 

24 

46 

.15 

1 1 

30 1 

70 

29 

50 

20 

20 

30 7 

75 

34 

53 

.25 

32 

315 

80 

40 

58 

.30 

46 

32 5 

.85 

47 

63 

.35 

6 

34 

866 

50 

65 

40 

8 

35 

90 

56 

69 

.45 

11 

37 

.95 

69 

78 

50 i 

13 

39 

100 

100 

100 


* Calculated by means of formula jSy = 1 — Vl — 


t Calculated by means of formula — 1 — jh . 

Values of Pm and Pg are given m Table XIII. It is interest- 
ing to note that the value of Pm is negative for roi kss than .50. 
This means that transformed values of Xi are less efficient as 
predictions than Mq used as the prediction for all members of 

p. 186, credits Bailor with originating the term “predictive index'’ and pre- 
sumably the formula, it is strange that the basis of its derivation has been over- 
looked by so many who have used it 
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the population Comparison of Tables XII and XIII will re- 
veal the superiority of predictions made from the regression 
equation. 

Efficiency of prediction when is the criterion. If the 
measures of Xo are fallible, i e , involve variable errors of meas- 
urement, and the predictions are to be compared with true 
measures of the dependent variable, is the criterion. If Mo 
IS taken as the prediction for each member of the population, 
the standard error of estimate, standard deviation of — Mo, 
is (T^ which is equal to croslr^o- Hence, we have cr^.u = 

If random predictions Xg are compared with the true meas- 
ures, the error of estimate will be X^ — Xg 

(^cc 0 — jy 

= + crl 

Since cr« = cToVrocandcTj, = co 

= Vcrgroo + erg 

= <roVl + Too 


Table XIII Efficiency of Pbediction of Xq, Where — xi Is Taken 

cri 

AS Evidence of xo, for Various Values of roi 


roi 



7*01 


Pg 

00 

-41 

0 

55 

5 

33 

05 

-38 

25 

60 

11 

37 

10 

-34 

5 1 

65 

16 

41 

15 

-30 

78 

70 

23 

45 

20 

-26 

10 6 

75 

29 

50 

25 

-22 1 

13 

80 

37 

55 

30 

-18 

16 

85 

45 

61 

35 

-14 

19 

90 

55 

68 

40 

-10 

23 

95 

68 

78 

.45 

- 5 

26 

100 

100 I 

100 

50 

0 

29 





* Calculated by means of formula Pjy ~ 1 — •V 2 — 2ro i 
t Calculated by means of formula Pg == 1 — V 1 — roj 
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Comparing cr^.i — ao^JroQ — with these cnterion standard 
errors of estimate we obtain ^ 

^^00 ~~ Al 
Vroo 

Vl + Tqq 

From the standard error of estimate ctq Vl + r^o ~ 2roi two 

additional measures of predictive efficiency are obtained 

Vl + roQ — 2roi 

Vfoo 

Vl + rpQ ~ 2roi 
Vi + roo 

Table XIV gives E^g for various values of Tqq and roi or 
jBo .123 . . n* The column for roo = 1.00 gives the values for 
Eg. Comparison of a value m this column mth the corresponding 
ones in preceding columns shows how much more efficiently 'we 
are able to predict true measures of the criterion than falhble 
measures of it. 

For example, if the coefficient of correlation between scores 
on an aptitude test and the marks received in a course (roi) 
is .60, the improvement over chance prediction of the marks 
is .43. If the coefficient of rehability of the marks (roo) is .70, 
E^g == .55. Hence the efficiency of predicting the fallible marks 
is 43 per cent better than pure guess, while the efficiency of pre- 
dicting ^^true^' marks is .55 per cent better than pure guess. 
It should be noted that except by chance, roi, the correlation 
between the measures used in prediction and the falhble cnterion 
scores, cannot be greater than the coefficient of rehability of 
the criterion scores. 

1 For a dijfferent form of the first formula, see Conrad, H S , and Martin, 
G B “The Index of Forecasting Efficiency for the Case of a ‘ True* Criterion,’^ 
Journal of Expervmental Educcdwn, March, 1936 


PocM = 1 — 
PoC. = 1 - 


= 1 — 
E„, = l- 
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Table XIV. Efficiency op Prediction op by Means op Regression 
Equations, for Various Values of ?oi or Rq 12 n and Toq, 
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Figure 5 shows graphically the values of Em and Eg, and of E^g 
for roo equal to .50 and .70 This figure affords an excellent 
basis for arriving at a clear understanding of these measures of 
the efficiency of prediction. 

Efficiency of prediction when a multiple regression equation 
is used. The preceding exposition of procedures for interpreting 
the standard error of estimate of predictions has been in terms 
of prediction from a smgle independent variable. If a multiple 
regression is used Eo 123 . . . n may be substituted for roi. It 
should be noted, however, that the coefficient of multiple cor- 
relation tends to exaggerate the accuracy of the predictions 
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Fig 6 DifTeient measures of efficiency of prediction 
for values of roi 

and hence for precise determinations the obtained i?oi 23 .. n 
should be corrected ^ 

Accuracy of individual predictions. The preceding pages 
have dealt with the accuracy of predictions for a representative 
population as a group. The probable error of estimate may be 
used with individual predictions. The procedure is similar to 
that of interpreting the probable error of measurement with 
reference to individual scores ^ When Xo is the criteiion and the 
prediction has been made from the hnear regression equation, 
we may write 

Probable limits of status = Xq 6745o-oVl — rh 

1 Ezekiel, M Methods of Correlation Analysis New York John Wiley and 
Sons, 1930, p 177 Ezekiel gives the following correction formula 

in . n = 1 “ (1 ■” -^0 J29 . n) __ 

m which n is the number of sets of observations in the sample and m is the num- 
ber of constants m the regression equation 

2 See page 134 
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When is taken as the criterion, v/e have 


Probable hmits of status == Zo . 6745 croVroo — 

Corresponding statements may be made when transformed 
values of Xi are used as predictions. 

The probable limits of the status affords a basis for deter- 
mimng the chances that the status actually attained wiU be 
above (or below) a specified position. Suppose, for example, it 
is desired to determme the chances that a student whose score 
on a prognostic test is Zi will achieve a standing above a par- 
ticular mark. An answer may be easily obtained from the 
correlation table provided the number of students havmg the 
specified score is sufficiently large. The theoretical answer may 
be derived in general form as follows: Suppose the specified 
position IS represented by ^ ilfo + mcro. The predicted status is 
given by the regression equation 


Zo = Mo + roi-“ (Xi - Ml) 


The difference between the specified status and the predicted 
status is 

Mo + roi - (Zi - Ml) - (Mo + ma-o) 


which simplifies to 


o'o 



Zi - Ml 

CTl 



The predicted status, however, is subject to an error of esti- 
mate This means that a student whose predicted status is 
below the specified status may attain a status above it, and that 
one whose predicted status is above may fall below The proba- 
bility of the attamed status bemg above the specified status 
may be determined. The first step is to express the above dif- 
ference in terms of 0*0.1, the standard deviation of the distribu- 
tion of the errors of estimate Since 0*0 1 = 0*0 VI — r|i = o'ofcoi, 

1 The value of m may be either positive or negative. 



STUDYING PROBLEMS OF PREDICTION 349 


(To 


o'q^i 


koi 


Substituting this value for ao we have 


/ \ 

o-ol roi ^7 m j = <70.1 


Tqi — m 

(^1 


Given a value of and of m, the fraction is easily evaluated. 
The probability corresponding to it is determined by refernng 
to a table ^ giving areas under the normal probabihty curve for 
various distances from the mean. If the value of the expression 
IS +.6745o‘o.i, the chances that the attained status will be above 
the specified status are 75 in 100, If the value is “-.6745(7o.i, 
the chances that the attained status will be above the specified 
status are 25 in 100. Conversely, the chances that the attained 
status wiU be below the specified status are m the first case 25 
in 100 and in the second 75 in 100. If the value is +1.175(7o.i, 
the chances are 88 m 100 that the attamed status will be above 
the specified status. 

Critical scores. In deahng with prognostic measures several 
workers have introduced the idea of a “critical score/' le., 
the score below which the chances of a specified degree of suc- 
cess are less than a stated probability. A critical score is most 
conveniently obtained from the correlation table, ^ but a theo- 
retical critical score may be determined from the relationship 
developed in the preceding paragraph. In this relationship the 
variables are m, Xi, and the probability. Given m and the 
probability, we may determine the value of Xi which will be 
the critical score Let the deviation value corresponding to the 
given probability be represented by mVo i. 


roi- 


m'do 1 


Xi -Mi 

O’! 


— m 


■ O'o.i 


1 See page 80 for reference to a convenient table 

"For an illustration see Torgerson, T L, and Aamodt, Geneva P. “The 
Validity of Certain Prognostic Tests in Predicting Algebraic Ability,” Journal 
of Experimental Education, 1 277-79, March, 1933 
This illustration is for transformed values of Xi used as predictions A 
similar table could be constructed for predictions from the regression equation. 
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Solving for Xi we have 

X. = 

roi 

Suppose the speciJ&ed probability of failure is three in four. 
The corresponding value of m' is — 6745 . Suppose also that 
m is —.50 which corresponds to defining the minimum limit of 
success as a mark of C in a normally distributed system of five 
marks A, B, C, D, and E. If tqi == .60, Xi = Mi — 1 73cri. 
Hence the theoretical critical score for the specified conditions 
is 1 730*1 below the mean of the prognostic measures Three 
out of every four students with such a score on the prognostic 
test may be expected to receive a D or E. 

Coefficient of correlation as an index of predictive efficiency. 
One of the uses of the coefficient of correlation mentioned in 
Chapter IV, page 116, was that of determining the predictive 
values of prognostic measures. Since the formulae for efficiency 
of prediction involve r as a variable, the values given in 
Tables XII, XIII, and XIV are essentially interpretations of 
the corresponding coefficients of correlation when this statistical 
technique is employed to investigate the prognostic value of a 
given set of measures. For example, if ni = .60, Em = .20 and 
Eg = .43. Hence the coefficient of correlation may be referred 
to as an ''index of predictive efficiency^’ or more simply as a 
"predictive index.” 

C. Efficiency of Predictions in Practice 

Magnitude of the predictive index in practice. Our interest in 
prediction in education is mainly with respect to future success 
either in school or in a vocational activity. The magnitude of 
the obtained correlations, roi or i 2 o*i 23 • • • «? vary but the situa- 
tion may be illustrated by reviewing briefly reported studies 
relating to (1) predicting success in the first year of college, 
and (2) predicting teaching success. 

The reader should note a significant difference between these 
two prediction situations. In the first, "success in the first year 
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of college” is commonly thought of as the mean of the marks the 
student receives and when defined in this way valid measures 
of the variable to be predicted are available. Teaching success, 
on the other hand, has no generally recognized measure and as 
will be pointed out later there is evidence that the ratings ob- 
tained aie giossly lackmg in vahdity. 

1. Predicting success m the first year of college.^ Since Thorn- 
dike ^ and others presented evidence to show that the traditional 
college entrance exainmation was not a satisfactory means of 
determining admission to college, the predictive value of several 
items of information has been investigated. The reported 
coefficients of correlation for marks received in high school 
vary rather widely, probably due largely to differences in the 
populations from which data were secured.® The values of 
r range from about .30 to over .70 and the central tendency falls 
within the mterval .50 and 55. If only the more comprehensive 
investigations are considered, it seems probable that for the 
typical college, the coefficient of correlation between mean high 
school standing and mean mark m the freshman year is in the 
neighborhood of .55. This estimate is supported by the investi- 
gation of Odell,^ who secured the records of 1677 students who 
graduated from Ilhnois high schools in 1924. The value of r 
was found to be .55. 

The range of the coefficients of correlation for intelhgence 
test scores and mean marks m first year of college is somewhat 


1 This brief summary does not adequately represent the amount of research 
relating to the prediction of scholastic success at the college level The interested 
reader will find a number of references m the writings by Douglass, Odell, and 
Tyler. Kaulfers has reported a summary of the research relating to prognosis in 
foreign languages Kaulfers, W V “Present Status of Prognosis in Foreign 
Language,” School R&oiew^ 39 585-96, October, 1931 

2 Thorndike, E. L “The Future of the College Entrance Examination 
Board,” Educ^iorml Review, 31 470-83, May, 1906 

®For a brief summary, see Douglass, H R. “The Relation of High School 
Preparation and Certain Other Factors to Academic Success at the University 
of Oregon,” University of Oregon Publication, Vol. Ill, No. 1 Eugene: Uni- 
versity of Oregon, 1931 61 pp 

^ Odell, C W “Predicting the Scholastic Success of College Freshmen,” 
University of Illinois Bulletin, Vol 25, No. 2, Bureau of Educational Research 
Bulletin, No 37 Urbana University of Ilhnois, 1927 54 pp 
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less, extending from about .30 to slightly above .60. The central 
tendency is approximately .45. For some of the more appropri- 
ate tests a slightly higher coefficient will be obtained from a 
typical population. Age at graduation from high school, rat- 
ings by the principal, and scores yielded by various instruments 
for measuring personahty traits ^ have been compared with 
achievement in college, but the predictive value appears to be 
relatively low and the inclusion of such items m a multiple re- 
gression equation does not result m very large improvements 
over the predictions made from high school record and mtelli- 
gence test scores. 

Odell ^ found a coefficient of multiple correlation of .58 for 
mean high school mark and the score made on the Otis Self- 
Administering Test of Mental Ability, Higher Examination as 
independent variables. This is only .03 larger than the simple 
coefficient of correlation between mean mark in first year of 
college and mean high school mark. Douglass ® reported cor- 
responding coefficients of .56 and .63 using the percentile rank 
on the American Council on Education Psychological Test as 
the measure of intelligence. Wood ^ using records of only 
97 students found a multiple coefficient of .66 for mean high 
school mark, score on Thorndike Intelligence Examination for 
High School Graduates, and Score on New York Regents' Ex- 
amination. A few other investigators ^ have secured slightly 
higher coefficients. It appears, however, that considering the 
practical difficulties of obtaining additional information before 
the student enters college, the improvement of prediction does 
not justify the use of data other than high school record and 

^For an illustration, see Tyler, H T “The Bearing of Certain Personality 
Factors Other Than Intelligence on Academic Success,” Teachers College, Colum- 
bia University Contributions to Education, No 468 New York Bureau of Pub- 
lications, Teachers College, Columbia University, 1931 89 pp This monograph 
gives a brief summary of the more important previous studies and a helpful 
bibliography 
2 Op, ai , p 37 
® Op ciL, p 48. 

•*Wood, B. I) Measurement in Higher Education Yonkers-on-Hudaon: 
World Book Company, 1923, p 87 

® See Douglass, op. cit , p 50, for summary of findmgs. 
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score on an intelligence test designed to predict college suc- 
cess 

2 Predicting teaching success. Beginning with Meriam's pi- 
oneer investigation ^ there have been numerous studies of the 
relation of marks m academic subjects, marks in professional 
subjects, general mtelligence test scores, and measures of other 
traits to measures of teaching success ^ The reported coefficients 
of correlation between general inteUigence and teaching success 
range upward from approximately zero to about .45. The high- 
est correlations are reported by Somers^ who employed a 
composite of the Thurstone Cycle Ommbus Test and the Trabue 
Language Completion Test as the measure of intelhgence. It 
appears likely that for a typical population of teachers the 
correlation between scores on an intelligence test and the ratings 
of supervisors is not much greater than 20. 

The coefficients of correlation reported for measures of 
achievement in professional courses (education) and for scores 
made on professional tests also vary wddely, but on the average 
are shghtly higher. For general scholarship, mean mark in 
teaching subject, or other phases of academic trammg, the 
coefficients range upward from approximately zero to .60, re- 
ported by Somers who also found a correlation of .615 for rat- 
ings of personality made at the end of the freshman year of 
training. Hunt ^ has reported coefficients ranging from SO 
to .50 for scores on an aptitude test, and Morris ^ found a 


^ Meriam, J. L. “Normal School Education and Efficiency m Teaching,” 
Teachers College, Columbia University Contributions to Education, No 1 New 
York Bureau of Publications, Columbia Umversity, 1906, pp 51-115 

2 For a tabular summary of the “best known” studies, see Barr, A. S , and 
Douglas, Lois “The Pre-traimng Selection of Teachers,” Journal of Educa- 
tional Research, 28 92-117, October, 1934. 

The accompanying bibliography includes 172 references 

3 Somers, T. T “Pedagogical Prognosis,” Teachers College, Columbia Univer- 
sity Contributions to Education, No 140 New York. Bureau of Publications, 
Teachers College, Columbia University, 1923 129 pp 

^Hunt, Thelma “Measuring Teaching Aptitude,” Educational Administra- 
tion and Supervision, 15 334-42, May, 1929 

5 Morns, E H “Personal Traits and Success in Teaching,” Teachers College, 
Columbia University Contributions to Education, No 342 New York Bureau of 
Publications, Teachers College, Columbia Umversity, 1929 75 pp. 
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coefficient of 510 for scores on a Trait Index Test and practice 
teaching marks. 

The variation in the reported coefficients of correlation for 
measures of general intelligence and other traits is doubtless 
due in part to differences in the groups of teachers for which 
data were secured In most cases the group studied does not 
appear to be very representative of the general population of 
teachers Another and probably more significant explanation 
of the variation is the presence of variable errors m the obtained 
measures of teaching success. Since the measures commonly 
employed are subjective, variable errors of measurement are to 
be expected, and it is likely that the estimates involve also 
relatively large variable errors of validity. It seems reasonable 
to say that the true criterion of a teacher's success is the total 
growth of her pupils toward the objectives of the school. Meas- 
ures of this criterion are difficult to secure, but studies ^ of the 
correlation between ratings of teachers and certain measures 
of pupil achievement indicate that the validity of the ratings of 
teachers is so low as to make them practically worthless as 
measures of teaching success.^ The ratings made by a super- 
visor are merely his estimates with reference to what he considers 
good teaching to be. It is generally assumed that a more valid 
measure of teaching success is secured by using a composite of 
several ratings.^ The reliability of the composite may be high ^ 

^ Crabbs, L M “ Measuring Efficiency m Supervision and Teacbmg,” 
Teachers College, Columbia University Contributions to Education, No 175 
New York Bureau of Publications, Columbia University, 1925 98 pp 

Taylor, H “The Influence of the Teacher on Relative Class Standing in 
Arithmetic Fundamentals and Reading Comprehension,” Twenty-Seventh Year- 
book of the National Society for the Study of Education, Part II Bloomington, 
Illinois Public School Publishing Company, 1928, pp 97-100 

2 For a more extensive review of the evidence see Corey, S M “The Present 
State of Ignorance about Factors Effecting Teaching Success,” Educational 
Administration and Supervision, 28 481-90, October, 1932 

2 Boardman reports a coefficient of 33 between general intelligence scores and 
a composite of ratmgs. This is one of the highest that has come to the attention 
of the present writers 

Boardman, C W “ Professional Tests as Measures of Teaching Efficiency in 
High School,” Teachers College, Columbia University Contributions to Education, 
No 327 New York Bureau of Publications, Teachers College, Columbia Uni- 
versity, 1928 * 84 pp 

^ The reliability of Boardman’s composite is given as .91. 
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but this condition does not reveal its validity. In view of our 
state of ignorance concerning what constitutes good teaching, 
we are not justified in assuming that the mean of any number 
of ratings approaches a vahd measure of teaching success 
The fact that we have no satisfactory means of measuring 
teaching success makes the value of any prediction formula 
problematical ^ 

Dependability of an index of predictive efficiency. The 
derivation of the standard error of estimate and of the subse- 
quent mdices of predictive efficiency is based upon the assump- 
tion that the population to which the regression equation is 
applied as a formula of prediction is equivalent in all essential 
respects to the population from which the data for the deriva- 
tion of the equation were obtained The population to which the 
formula is apphed in practice is likely not to be equivalent in all 
essential respects. For example, m securing data for deriving a 
formula to predict the success of high school graduates m their 
first year of college, one is hmited to those graduates who enter 
college and remain long enough to permit measurement of their 
success Such a group is not likely to be entirely representative 
of high school graduates m general. Furthermore, the courses 
pursued and the instructional conditions m college may vary 
from institution to institution and within the same institution 
from year to year. Lack of equivalence in any respect that 
affects the marks received will tend to make the index of pre- 
dictive efficiency lackmg m dependability. 

Variable errors of vahdity in the criterion affect the de- 
pendabihty of the index of predictive efficiency. The probable 
presence of variable errors of vahdity in the measures of teach- 
ing success provides an explanation of the low index of predictive 
efficiency. In considering the effect of variable errors of validity 
in the criterion, it should be noted that their magnitude depends 
upon the label attached to the predictions If they are con- 
sidered merely estimates of what has been measured, there will, 

^ A good reference relative to tins point is Haggerty, M E “The Crux of the 
Teaching Prognosis Problem,” School (xnd Society ^ 35: 545-49, April 23, 1932, 
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of course, be no variable errors of validity. We, however, usually 
wish to predict actual scholastic success, actual teaching success, 
or whatever the label specifies, and hence, variable errors of 
validity are usually possible. In the absence of valid measures 
of the criterion, the magmtude of the errors of estimate cannot 
be known and hence the dependability of the index of predictive 
efficiency cannot be determmed 

In deriving the formula for the standard error of estimate, 
the requirement was made that Mo == Mo so that Xq — xo could 
be substituted for Xo — Xo. (See page 335.) This requirement 
is equivalent to specifying that the measures from which the re- 
gression equation is derived and the measures used in making 
predictions do not involve a systematic error or that the system- 
atic errors in the corresponding groups of measures are equiva- 
lent. Hence, the presence of a systematic error in an inde- 
pendent variable or in the dependent variable, either in the data 
from which the regression equation is derived or in the measures 
used in making predictions, will affect the errors of estimate 
unless the systematic errors are equivalent. When the errors of 
estimate are affected, the dependability of an index of predictive 
efficiency will also be affected Edgerton ^ has derived an equa- 
tion for the standard error of estimate which includes the effect 
of systematic errors of measurement. The formula, however, is 
relatively complex and hence is seldom used. 

The dependabihty of an index of the predictive efficiency is 
affected also by non-conformity of the data to assumptions 
made in the derivation of the regression equation. For example, 
the linear regression equation assumes linearity of relationship 
and hence is appropriate only when this assumption is satisfied. 
If it is not, the dependability of the index of predictive efficiency 
will be affected.^ 

1 Edgerton, HA “ Measuring the Vahdity of Predicted Scores,” Journal of 
Educational Psychology, 21 388-91, May, 1930 

2 For a discussion of the assumptions relating to multiple correlation, see 
Ezekiel, Mordecai “The Assumptions Implied m the Multiple Regression 
Equation,” Journal of American Statistical Assocujtion, 20. 405-08, September, 
1925 
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In view of the various factors that may affect the depend- 
abihty of an index of predictive efficiency, it is apparent that 
the calculated value should not be considered highly dependable 
unless one is able to show that the possible influences do not 
apply. The standard error of estimate and the derived indices of 
predictive efficiency are based upon a number of assumptions 
that are likely to be only roughly approximated when the re- 
gression equation is actually applied as a formula of prediction. 
In many cases, perhaps most cases, the indicated predictive 
efficiency will be greater than is actually realized. Hence, it is 
probably wise to discount somewhat the calculated index of 
predictive efficiency. A valid measure of predictive efficiency 
for a particular population may be obtained by deriving the 
regression equation and applying it to this population The dif- 
ferences between the criterion measures and the predictions will 
be valid errors of estimate provided the criterion measures are 
accurate 

Practical minimum efficiency of prediction An important 
question relates to the minimum efficiency for which prediction 
is justified as a practical procedure. The answer to this question 
wiU be a judgment, but m making the decision, one should 
clearly understand the measure of predictive efficiency employed, 
and bear in mind the use that is to be made of the predictions. 
In an article pubhshed in 1927 Hull ^ expressed the opinion that 
when the efficiency {Em) is less than 13 per cent, the value of 
makmg predictions is doubtful and that for efficiencies between 
13 per cent and 20 per cent, the predictions are only possibly 
useful.” Hulks statement has been widely accepted, but it 
should be noted that he gives no indication of understanding the 
measure of efficiency of prediction {Em) that he was dealing 
with. One can only speculate in regard to what his judgment 
might have been if he had understood that his measure of ef- 
ficiency represented the per cent of improvement over usmg the 
mean as the estimate for each member of the population, and 

1 Hull, C L “ The Coefiacient of Correlation and Its Prognostic Significance,” 
Journal of Educational Research, 15 337, May, 1927. 
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that the per cent of improvement over chance prediction cor- 
responding to Em equal to 13 is 43. Furthermore, he did not 
consider the efficiency of predictmg true measures of the crite- 
non. Examination of Table XIV shows that when r^o is 80 or 
less, as is usually the case, the values of E^g are materially 
larger than those for Eg These facts together with the inter- 
pretation of roi in terms of per cent of predictions in error by 
more than a specified amount and especially the application of 
this interpretation procedure to the lower prognostic measures 
suggest that he might have arrived at a somewhat different 
opinion 

Although the calculated values of foi and i?o 12 n tend to 
exaggerate the predictive value of prognostic measures, the 
present writers are of the opinion that predictions may be suf- 
ficiently useful to justify the expense involved m calculating 
them when the correlation is as low as 40 In some situations 
predictions based upon even lovrer correlations may be worth 
while. They would, however, emphasize that predictions should 
always be used wisely. They do not subscribe to a pigeon-hole 
policy of placement m either educational or vocational guidance. 

An explanation of the inefficiency of prediction.^ In order 
to obtain a basis for considering means for improving predic- 
tion, it will be helpful to inquire into the causes of inefficiency. 
What we attempt to predict is commonly thought of as the re- 
sultant of a number of contributing factors or causes, and the 
use of the regression equation implies the assumption that the 
dependent variable a;o is a hnear function of these causes. If 
the causes are represented by ^ ai, as . a^, this assump- 
tion is expressed by writing 

M = Coiai + C02a2 + Cozdz + . . . 

Although this assumption is not necessarily valid, it appears 
reasonable and may be accepted as an approximation. The cor- 

1 This explanation is m terms of an approach developed in the next chapter 
If the reader experiences difficulty in understanding the explanation, he should 
return to it after studying Chapter XI 

2 This argument is given in terms of deviation measures This, however, does 
not restrict its application 
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relation between the dependent variable and the independent 
variables and the intercorrelations of the latter may be explained 
by assuming a similar structure for the independent variables ^ 
Hence, the variables of a multiple regression equation might 
have a structure similar to the foilowung 

= CoiUi “h C02U2 “h C03U3 + + Cozdt -p • • • "h CQrnP'm 

Xl = Ciiai + ^12^2 + Ciejasi 

^2 = C 22 U 2 + ^23^53 + C^SiCLsi 

Xz = CsiUi + 033^3 + Cz3,as^ 

Xn CjiZ^^Z Hh Cntdt *4" Cns^a^ 

The causes ai, az • . , a^ appear in one or more of the inde- 
pendent variables, but the remaimng causes of the dependent 
variable are not found in any independent variable. Each inde- 
pendent variable includes a factor that does not appear in the 
dependent variable This factor may be merely the variable 
error of measurement, but usually it includes also a variable 
error of validity It may mclude other elements not appearing 
in the dependent variable. 

This analytical description of the variables involved in a re- 
gression equation offers a basis for pointing out the causes of 
inefficiency in prediction. In a typical situation several of the 
a's (causes) of the dependent variable (xo) are not found in any 
of the mdependent variables. For example, a student^s success 
m first year of college is probably conditioned by his roommate 
and other companions, the kind and quality of instruction he 
receives, his instructors’ methods and policies of grading, and 
other factors that are characteristics of the school community 
It IS also probably influenced by personal traits that are seldom 
measured. Since we are limited to those prognostic measures 
that may be obtamed at the time the predictions are desired, it 
IS obvious that estimates of success m college at the time of 
graduation from high school will mevitably involve errors of 
estimate due to the absence of a number of causes in the regres- 


1 See the explanation of factor analysis, pages 399 f. 
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Sion equation. Similar statements may be made relative to 
other prediction situations. 

A second cause of the mefficiency of predictions made from 
the regression equation is not so generally understood A de- 
pendent variable cannot be precisely expressed as a linear func- 
tion of a group of independent variables when they mclude 
factors such as indicated by the a’s with “ s subscripts not found 
in the dependent variable. The independent variables with 
which we deal in educational prediction usually include such 
factors and this condition contributes to the mefficiency of our 
prediction formula. In other words, if each cause of the de- 
pendent variable appeared m at least one of the independent 
variables, the regression equation would be only a best estimate 
of the relationship between the dependent variable and its 
causes 

A third cause of inefficiency in prediction is due to the over- 
lapping of the independent variables. If we think of the de- 
pendent variable being expressed as a linear function of its 
elemental causes, a prognostic measure (independent variable) 
is likely to include two or more of these elemental causes, and a 
given elemental cause is likely to appear in two or more of the 
independent variables The typical situation is probably similar 
to that represented by the followmg equations. 

Xo = 3ai + 5a2 + <^3 ■+• 
iCi = 2cii “k 3(!X2 "k 

X2 = 5a2 + <^3 + <^52 
X 3 = 3ai + ^2 + 6^3 + 

Due to the presence of and it is very unlikely that 

the weighting of Xi, X 2 , and xs in the regression equation will be 
such that 601 23 ( 2 ai) + &03.i2(3ai) will be equal to 3ai Similar 
statements may be made with reference to the other elemen- 
tal causes. Hence the overlapping of the independent vari- 
ables contnbutes to the inefficiency of the predictions. 

Securing the most accurate predictions for a given criterion 
measure. The three causes of mefficiency m prediction suggest 
two points of attack for decreasing the errors of estimate. In 
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the first place, one should attempt to secure prognostic measures 
that will include as many of the causes (a^s) of the dependent 
variable as possible. It appears, however, that this attempt can- 
not usually be completely successful because some of the causes 
cannot be measured at the time the predictions are desired. The 
other attack is to endeavor to secure ^^pure^^ measures of ele- 
mental causes. Factor analysis described in the following chap- 
ter appears to afford assistance in accomplishmg this, but it 
would be hazardous to predict the degree of success that will 
eventually be attamed. Random explorations to discover new 
prognostic measures are not likely to be very fruitful. The 
addition of an independent variable that includes no new 
“ cause will improve the prediction formula only very slightly 
if at all. The addition of an independent variable in which the 
causal elements are minor factors will also not result in much 
improvement. 

Recognition of the causes of mefficiency affords an explanation 
of why in predictmg success in the first year of college the in- 
crement of accuracy resulting from increasing the number of 
independent variables is not compatible with the additional 
labor involved.^ In this connection it should be noted that the 
multiple correlation coefficient tends to exaggerate the accuracy 
of predictions.^ In other words, if the multiple regression equa- 
tion derived from one population is applied to a second popula- 
tion differing from the first only by chance, the estimated stand- 
ard error of estimate will tend to be larger than the standard 
deviation of the actual errors of estimate This phenomenon is 
referred to as the shrinkage of the coefficient of multiple cor- 
relation Two formulae ® have been proposed for calculating a 
value of R that will give a more accurate standard error of 

1 Hull, C. L “The Joint Yield from Teams of Tests,’* Journal of Educatwnal 
Psychology, 14 396-406, October, 1923. 

2 See reference to Ezekiel on page 347 

3 One formula was derived by B B Smith and reported by M J B Ezekiel at 
the meeting of the American Mathematical Society in 1928 The other formula 
was derived by Wherry Both are given in Wherry, R J “A New Formula for 
Predicting the Shrinkage of the Coefficient of Multiple Correlation,” Annals of 
MathemaUcal Statistics, 2 440-57, 1931 
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estimate. Larson ^ has shown the shrinkage empirically The 
increase of the coelSicient of multiple correlation, due to adding 
one more variable, is small when the number of independent 
variables is already large. Hence, the shrinkage, due to the 
addition of a variable, may be greater than the increase. Larson 
found that for his data the shrinkage exceeded the increase when 
ten independent variables were included. In other words, 
slightly better predictions were obtamed from eight variables 
than from ten variables. 

ILLUSTRATIVE STUDIES OF PREDICTIONS 

Since several studies of predicting success in first year in college and of 
predicting teaching success have been referred to in the preceding pages, 
the following references have been hmited to other prediction situations 

Gordon, H. C ^^The Specific Nature of Achievement and the Piedictive 
Value of the IQ ” A thesis submitted foi the degree of Ph D in Educa- 
tion. Philadelphia University of Pennsylvania, 1931 147 pp 

The data for this study were obtamed from the twelve senior high schools 
of Philadelphia The value of the IQ as determined by the Otis Self- 
Administering Test of Mental Abihty for predicting success m school 
subjects at the twelfth-grade level is studied critically The report includes 
a summary and a carefully selected bibliography 

Grover, C C ^'Results of an Experiment in Predicting Success m First 
Year Algebra m Two Oakland Junior High Schools,” Journal of Educa- 
tional Psychology^ 23 309-14, April, 1932 

The dependent variable is scores on the Columbia Research Buieau 
Algebra Test, and the two independent variables are scores on the Orleans 
Algebra Prognosis Test and intelligence quotients obtamed by the Terman 
Group Test of Mental Ability The correlation between the prognostic test 
and the measures of achievement m algebra is 61, while the multiple 
correlation coefficient is 65 The study may be recommended to the reader 
because of its clear description of the procedures employed 

Kaitleers, Walter ‘‘Value of English Marks m Predicting Foreign- 
Language Achievement,” School Beview, 37* 541-46, September, 1929. 

In this study end-semester marks in Enghsh and average mid-semester 
and end-semester marks m high school Spanish were transmuted into point 
scores by means of the standard score procedure Coefficients of correlation 

^Larson, S C “The Shrinkage of the Coefficient of Multiple Correlation,” 
Journal of Educalwnal Psychology ^ 22 45-55, January, 1931 
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were then calculated of 54 boys and 55 girls. These coefficients are respec- 
tively 509 and 578 

Kelley, T L “Educational Guidance, an Experimental Study in the 
Analysis and Prediction of Ability of High School Pupils,'^ Teachers 
College^ Columbia Umvernty ContnbuHons to Education^ No. 71. New 
York Bureau of Pubhcations, Teachers College, Columbia University, 
1914 116 pp 

In this pioneer study, Kelley investigated the predictive value of school 
marks in various subjects; teachers^ estimates of intellectual abihty, con- 
scientiousness, emotional interest m school work, and oral expression, and 
special tests m algebra, English, history, geometry, and interests Regres- 
sion equations are reported for various combinations of these variables. In 
the appendix of the monograph are described techmques used in deriving 
comparable measures — one of the first applications of the standard score 
procedure. 

Limp, C E “The Use of the Regression Equation in Determming the 
Aptitude of an Individual,'^ Journal of Educational Psychology^ 16 414- 
18, September, 1925 See also Hull, C L, and Limp, C. E “The 
Differentiation of the Aptitudes of an Individual by Means of Test 
Batteries," Journal of Educational Psychology, 16* 73-88, February, 
1925 

Difference between school marks in English and school marks in type- 
writing for the same individuals constitute the criterion measures, or 
dependent variable, m this study The problem was that of predicting the 
difference between the two aptitudes rather than that of predicting aptitude 
in English or in typewriting. 

Miller, L W “An Experimental Study of the lowra Placement Examina- 
tions," University of Iowa Studies in Education, Vol. 5, No 6. Iowa 
City, Iowa University of Iowa, 1930 116 pp. 

The prognostic efficiency of the Iowa Placement Examinations was 
studied in this research. The results presented include “mter-part correla- 
tions, rehabihty coefficients, correlation coefficients with first semester 
grades, means, standard deviations, and an item analysis for each part " 
Both ordinary multiple regression equations and multiple regression equa- 
tions in the beta or standard score form are reported in the study An ex- 
cellent summary of previous research is given 

More, G V D. “Prognostic Testmg in Music on the College Level* An 
Investigation Carried on at the North Carolma College for Women," 
Journal of Educational Research, 26: 199-212, November, 1932. 

Intercorrelations for a group of 179 individuals were secured between 
avei age music marks and scores on a number of music tests, some of which 
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were devised by the author. A battery was selected from the tests having 
the highest correlations with the criterion A coefficient of multiple correla- 
tion of 73 was obtained. The author presents an mterestmg interpretation 
of her findings. 

Pateeson, D G , et al Minnesota Mechamml Ahhty Tests, Mmneapohs: 
The Umversity of Minnesota Press, 1930 586 pp. 

Seven of the tests tried out in the prelimmary study yielded rehabihty 
and vahdity coefficients of sufficient magmtude to justify revision and 
further experimentation The prognostic efficiency of battenes made up of 
various combinations of these tests was studied. 

Ross, C C ^'The Relation between Grade School Record and High School 
Achievement,” Teachers College, Columlna Umversity Contributions to 
Education, No 166. New York* Bureau of Pubhcations, Teachers 
College, Columbia Umversity, 1925. 70 pp. 

In this comprehensive study, coi relations are reported between aveiage 
marks in various elemental y school subjects and maiks m various first-year 
high school subj ects Con elations are also reported between vai lables wffiich 
represent composites of marks in several subjects and between high school 
achievement and other factors such as elementary school deportment, 
effort, and attendance On page 42 is given an mterestmg graphic lepre- 
sentation of the accuracy of prediction of general high school average, and 
achievement m Latin, Enghsh, and mathematics 

Sbgel, David, and Brintle, S L “The Relation of Occupational Interest 
Scores as Measured by the Strong Interest Blank to Achievement Test 
Results and College Marks in Certain College Subject Groups,” Jour- 
nal of Educational Research, 27 442^5, February, 1934. 

Correlations between Strong Interest scores relative to engineering, 
medicine, law, life msurance, personnel management, and purchasing 
agent and measures of scholastic achievement m English, mathematics, 
science, history and social science, and measures of mtelligence aie reported 
m this study While most of the coefficients are low, those for achievement 
m mathematics and science are high enough to be significant in the case of 
interest in engineering and in medicine The authors advocate the use of 
interest schedules m educational gmdance, but suggest the construction of 
one especially designed for this purpose. 

Toops, H A. “Predictmg the Returns from Questionnaires* A Study in 
the Utihzation of Quahtative Data,” Journal of Experimental Educa- 
tion, 3 204-15, March, 1935 

An illustration of the possibihties of systematic prediction. 
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West, C. H. Practical Statistics of Prediction,’’ Journal of Expen- 

menial Education^ 3 198“203, March, 1935. 

Shows the effect of using a modification of the regression equation 

Ztve, D. L Test of Scientific Aptitude,” Journal of Educahonal 
Psychology, 18 525-46, November, 1927. 

A test of scientific aptitude devised by the author was given to a group of 
50 research students m physics, chemistry, and electrical engineering and 
the scores obtained were correlated with judgments of competent individ- 
uals respecting the same trait Further study, and use of students in non- 
scientific departments indicated that the test “is a test of aptitude rather 
than of training, and is capable of differentiating scientific aptitude among 
highly selected and trained groups ” In addition to reporting an ordinary 
regression equation, the author gives a regression equation for piedictmg 
“true” scores. 



CHAPTER XI 


IDENTIFYING AND STUDYING CAUSE AND EFFECT 
RELATIONSHIPS 

The problems of this chapter. As indicated by the above 
titk; this chapter deals with two problems, first, the identifica- 
tion of cause and effect relationships, and second, the measure- 
ment of the contributions from the causes in such relationships. 
In comiection with the second problem, the techniques of cor- 
relation analysis will be described. 

A The Nature op Relationship 

An explanation of relationship between variables. Two 

correlated variables may be described as being related. One 
may be a cause ^ of the other or both may be effects of a common 
cause, or causes.^ When the first condition prevails, the relation- 
ship is described as one of cause and effect When the connection 
between the two variables is due to a common cause, the relation- 
ship is one of concormtant variation. 

If 0^0 and xi are two correlated variables expressed in deviation 
form, they may be thought of as being analyzable as follows: ® 

1 Contemporary philosophical discussions reveal considerable controversy 
relative to the concept of causation It will serve our purpose, however, to define 
a cause as an element in an existing situation which produces a condition dif- 
ferent from that which otherwise would have prevailed For an elaboration of 
this idea, see Lamprecht, S P Causality,” Essays %n Honor of John Dewey. 
New York: Henry Holt and Company, 1929, p 203 

® The principle that when two variables are correlated, one is the cause of the 
other or they are connected by a common cause was expressed by Mill several 
years before the idea of correlation as represented by a coefficient was developed 
by Galton 

Mill, J S A System of Logic New York Longmans, Green and Company, 
1906, p 263 This volume was first published in 1843 

3 For proof, see Kelley, T L Crossroads in the Mind of Man Stanford Uni- 
versity Stanford University Press, 1928, p 38 The statement of Kelley’s 
second proposition is not identical with that given here but the proof is ap- 
plicable 


366 
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Xq = Cofitoi + bo 
Xi == Ciaoi + bi 

In these equations, co and ci are constants, Uoi is a variable but 
for any pair of values of xo and xi it has the same value and bo 
and hi are variables which are uncorrelated with each other and 
with ooi In other words, this theorem states that if two variables 
are correlated, each one may be thought of as the weighted sum 
of two uncorrelated sub-variables, or factors, one of which is 
perfectly correlated with the corresponding factor of the other. 
The other sub-variables are uncorrelated with each other and 
with the common factor. Hence, the statement that two variables 
are related may be interpreted as meaning that they include a 
common factor, represented in the above analysis by aoi. This 
factor may not have the same name in both variables. Hence, 
the term common is not to be interpreted as meaning ^Hhe 
same,” but rather as meaning that the values of aoi in one vari- 
able are numerically equivalent to the corresponding values in 
the other. 

Two given correlated variables cannot be analyzed in the 
manner indicated ^ but the analytical structure may be illus- 
trated by using sums of uncorrelated variables. Table XV gives 
several illustrative values.^ 

Causal variables. The principle expressed on page 366 may 
be restated as follows * The common factor in one of two correlated 
variables may be a cause contributing to the other variable or 
the common factor in both variables may represent contributions 

1 The best estimate of coaui is the regression of xq on xi, The best 

0-1 

estimate of 6o is the error of estimate xo — roi — xi or xo — xa which may be 

0-1 

represented by xu i Hence, we may write as the best estimate of the above 
analysis 

Xo = Xo 4- 1 

xi = xi -j- Xi 0 

2 The values of the uncorrelated variables Ai, Jl 2 , and As were obtained by 
counting tosses of coins For example, the values of were obtained by tossing 
thirty coins and counting heads or tails In order to neutralize the effect of pos- 
sible imperfections in the coins, the tails were counted for one-half the tosses 
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from a common cause. Smce measurement may be indirect, 
either variable may be thought of as representing indirect 
measures of the common factor or of the common cause. From 
this point of view, any relationship may be thought of as one 
of cause and effect, but this designation is reserved for those 
cases in which the name attached to one variable defines a cause 
contributmg to the other. In other words, a causal variable is 
one whose name designates a cause of the specified effect For 
example, the correlation between reading test scores and arith- 
metic test scores is not accepted as evidence that reading ability 
is a cause of arithmetical ability or vice versa. However, if one 
of the tests is designated as an instrument yielding measures of 
general intelligence, which would not be wholly unjustifiable, 
the relationship might be designated as one of cause and 
effect 


Table XV, Illustrative Values of Two Correlated Variables 
Showing the Common Factor 


Xo « + Aa 

Xi = Ai + As 

35 = 15 -f 20 

24 = 15 + 9 

32 = 18 -h 14 

29 = 18 + 11 

27 = 15 + 12 

28 = 15 4- 13 

26 = 16 + 10 

27 = 16 + 11 

33 = 18 + 15 

25 = 18 + 7 

33 = 18 + 15 

32 18 + 14 

29 == 19 + 10 

33 = 19 + 14 

30 = 17 + 13 

31 = 17 + 14 

28 = 15 + 13 

26 == 15 + 11 

28 = 11 + 17 

25 = 11 4 14 

23 = 12 + 11 

25 = 12 + 13 

21 =. 14 + 7 

26 = 14 + 12 

33 = 18 + 15 

32 = 18 + 14 

35 = 17 + 18 

27 = 17 + 10 

25 = 13 + 12 

22 = 13 + 9 

33 = 18 + 15 

26 = 18 + 8 

31 = 18 + 13 

39 = 18 + 21 

33 = 16 + 17 

24 = 16 4- 8 

23 = 13 + 10 

21 = 13 + 8 

26 = 13 + 13 

25 = 13 + 12 

29 = 17 + 12 

31 = 17 + 14 

28 = 14 + 14 

23 = 14 + 9 
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The designation of a variable as a cause does not mean that 
it is wholly a cause Usually it includes a factor (component) 
that is uncorrelated with the effect and reference to a variable 
as a cause should be mterpieted to mean meiely that it mcludes 
a factor that is a cause. Similarly, the designation of the vari- 
able as an effect should be thought of as meaning that it includes 
a factor that is the effect of a factor in the causal variable. 

Techniques for identifying causes. Sometimes the problem 
of identifying a cause appears as one of determining the ex- 
planation of or the reasons for a certain condition or phe- 
nomenon. For example, an investigator may seek the explana- 
tion of failure or maladjustment in school Such problems may 
be expressed in terms of the identification of causal variables, 
but there is little advantage in doing so. 

The citation from Rice’s study on page 271 illustrates the use 
of the comparative survey when the causal influence of a variable 
is being investigated. Another illustration is furnished by the 
study of Smilhe and Spencer who measured the intelligence of 
five groups of pupils differing with respect to intensity of in- 
festation with hookworm.^ The data collected revealed that the 
heavier the hookworm infestation the lower the intelligence 
quotient, and they concluded that hookworm infestation is a 
cause of mental retardation This procedure has been called the 
“method of differences ” ^ The principle involved is that if two 
or more populations differ in respect to a given trait or char- 
acteristic the causes of this difference are to be found m the other 
traits or characteristic m which they also differ. The weakness 
of this method is that the identification is not certain. For 
example, in the investigation of Smillie and Spencer it may be 
that children of low intelligence are more likely to have hook- 
worms than children of higher intelligence. Hence, it may be 
that the infestation is an effect rather than a cause of low in- 
telligence or it may be that both are effects of a common cause. 

1 Smillie, W G , and Spencer, C R “Mental Retardation in School Children 
Infested with Hookworms,” Journal of Educational Psychology, 17 314—21, 
May, 1926 

2 Mill, op cit , pp 255-60. 
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In applying this procedure as a means of identifying causes, 
an investigator should consider the other differences as merely 
possible causes and critically examine each before accepting it 
as the cause of a given trait or characteristic. If an mvestigator 
is critical and persistent, he will frequently be able to make a 
highly dependable identification. An excellent illustration of 
the reasoning involved is furmshed by Brownell,^ who sought an 
explanation of the difference in the performances of certain 
groups of children on an arithmetic test. A detail of the admims- 
tration of the test, which at first was overlooked, turned out to 
be a clue to the explanation and hence the cause. It is relatively 
easy to collect a mass of data regarding differences, but the 
analysis and interpretation may require much critical thinking. 
Barr^s study ^ illustrates the difficulties of interpretation Al- 
though he secured much detailed information concerning the 
teaching performances, his conclusions relative to the causes of 
good teaching and poor teaching are admittedly not satisfactory. 

As pointed out in Chapter IX, controlled experimentation is 
superior to the comparative survey, but it may not be a feasible 
procedure. If Smillie and Spencer had started with several 
equivalent groups of children who were free from hookworm and 
then inoculated the members of some of the groups, the resulting 
differences in intelligence, after a period of years, would be more 
dependable evidence than that obtained. This, however, would 
not be a defensible procedure.® In other cases it would be diffi- 
cult to introduce the desired change and maintain the experi- 
mental conditions for a sufficient period. For example, it would 
be difficult to ascertain the causes of good and poor teaching by 
controlled experimentation. 

The “method of agreement ^ involves the survey of a group, 

1 Brownell, W A *‘An Evaluation of an Arithmetic ‘Crutch,’ ” Journal of 
Experimental Education, 2 5, September, 1933 

® Barr, A S Characteristic Differences in the Teaching Performance of Good 
and Poor Teachers of the Social Studies Bloomington, Illinois Public School 
Publishing Company, 1929 127 pp 

® This procedure was employed by Walter Reed m identifying the cause of 
yellow fever The use of human subjects m this case was justified by the im- 
portance of the anticipated findings 

^ Mill, op, at , pp, 254-55, 
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the members of which have a certain trait or characteristic m 
common for the purpose of determining other common traits or 
characteristics. For example, in seeking the causes of pupil 
failure a group of such pupils might be surveyed to determine 
what other charactenstics may be common to them or most of 
them. If this survey is highly detailed, it represents a series of 
case studies. The identification of other common traits or char- 
acteristics does not necessarily constitute identification of causes 
The common traits or characteristics may be due to the opera- 
tion of common causes. 

If the correlation between a given effect and another variable 
is not zero, this fact is evidence that this variable is a possible 
cause. The existence of correlation, however, is not proof of 
causation The correlation may be due to a common cause. For 
example, Ezekiel ^ cites the illustration of the correlation be- 
tween the number of automobiles passing a given point m Wash- 
ington, D. C , during each fifteen minute period from noon until 
midnight and the height of the water in the Potomac River dur- 
ing the same periods. The height of the water m the Potomac 
River is affected by the ocean tides which are in turn influenced 
by the moon The position of the sun has a definite influence on 
the movements of people, and at any given time there is a defi- 
nite although complex relationship between the position of the 
sun and that of the moon. Hence, on certain days a significant 
correlation may be obtained between the numbers of automo- 
biles and the heights of the water, but one cannot infer the im- 
derlymg causal connection from the coefficient obtamed. Hence, 
when employing correlation analysis in studying a problem of 
causation, it is necessary to demonstrate what variables are 
causes and the direction of causation by other means. 

B. Measxjeement of the Contributions 

A measure of the contribution from a causal variable. Exam- 
ination of Table XV reveals that the ratio of the values of the 

^ Ezekiel, Mordecai Methods of Correlation Analysis New York John 
Wiley and Sons, 1930, p 349 
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common factor to the corresponding values of Xqj the dependent 
variable is not constant This condition suggests an average^’ 
of the ratio of the common factor to Zo, but since raw data are 
frequently expressed from arbitrary zero points, a measure based 
upon them will not be satisfactory If the values of a variable 
are expressed as deviations from its mean, the standard devia- 
tion is a measure of its “magnitude for a given population. 
From the point of view of this concept of the “magnitude'^ of a 
variable, the contribution of an independent variable may be 
thought of as the effect it produces upon the variabihty of the 
dependent variable within a given population. Since the “mag- 
nitude^^ of the common factor may also be described in terms of 
its variability, the problem of measuring the contribution of a 
causal variable may be thought of as involving the determina- 
tion of a relation between the variability of the common factor 
and that of the dependent variable. 

The standard deviation is a measure of the variability of a 
group of measures, but, in order to simplify some of the relations 
involved, o-^, called the variance^ may be used as the measure of 
the “magmtude^^ of a variable. In terms of the symbolism on 
clol 

page 367, the fraction — called the variance ratio , gives the 
(tI 

per cent of the variance of the dependent variable that is due 
tp the common factor. Hence, the problem of measuring the 
contnbution from a causal variable within a given population is 
defined as that of determining the value of the variance ratio 


The value of the variance ratio. ^ The argument is simplified 
by expressmg the two variables in standard score form 

Zo = WqOqi -f* Wif)Q 
Zi = VqOqx -{- Vibi 

^For a complete account of the argument of this theorem see Monroe, W S. 
“ Note on the Interpretation of Coefficients of Correlation ” to be published in 
the J ourncd of Edumi/ional Psychology 
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Since the standard deviation is the unit, cr© == 0*1 = == 

= cr6j == 1 00. If WqOoi + wibo and Voaoi + Vibi are substituted 

for Xq and xi in the formula, roi = we obtain^ roi = WoVo> 

N(Tq(Tx 

The variance of w^a^x is wl. Smce the variance of Zq is unity, 
wl IS also the variance ratio. Similarly vl is the variance ratio 
which expresses the proportion of the variance of Zx that is due 
to the common factor. By squaring the equation roi = wm, we 
have = wlvl. Hence, we may state as a theorem: The square 
of the product-moment coefficient of correlation is equal to the 
product of the variance ratios of the two variables. 

Since the variables are in terms of standard units, ml + m\ 
= 1.00 and vl + vl == 1.00 and hence the limiting values of 
both ml and vl are 0 00 and 1 00. For a given coefficient of 
correlation, i.e., for a fixed value of wlvl, the minimum value of 
wl, the variance ratio of Zq, occurs when vl — 1,00. The minimum 
value is roi- If vl = wl, then rl^ — or wl == roi. If vl < wl, 
then wl is greater than roi but the limiting value of wl is 1.00. 
Hence, the limits of the value of wl, the variance ratio of 20 are 
roi and 1 00. 

Significance of this theorem. This is a theorem of considerable 
importance when a coefficient of correlation is being interpreted 
in terms of the degree of communahty of the two variables 
which is expressed by the variance ratio. The limits of the value 
of the variance ratio are r^i and 1 00 and the value m a particular 
case is determined by the analytical structure of the two var- 
iables For example, employing the symbolism of the preceding 
paragraphs, if Vx is zero, vl = 1 00, and hence wl = rh. This 
means that if the independent variable, Zx, contributes itself 
completely, the value of the variance ratio is given by rh. If, 
on the other hand, the common factor in the independent var- 
iable causes only a small portion of its variability, i.e., if Vi is 
relatively large in comparison with vo, then for the same value of 

1 The variables zo and zi may be analyzed into any number of factors The 
coefficient of correlation will be the sum of the products of the coefficients of the 
common factors This theorem is useful in theoretical work See page 401, 
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roi, wIj the variance ratio will have a value only slightly less 
than 1 00 In other words, for a given coefficient of correlation, 
say 50, the value of the variance ratio may be as small as 25 or 
as large as 1 00 The determination of its value within these 
limits depends upon the analytical structure of the two variables. 

There is no general techmque for determining the analytical 
structure of two correlated variables,^ but in certain cases, an 
assumption appears reasonable. For example, in interpreting 
the correlation between general intelligence test scores and the 
scores on a general achievement test, Kelley ^ assumed that 
'Hhat part of achievement which is not intelligence is as great 
an amount as that part of intelligence which is not achieve- 
ment.” Granting this assumption, which is not materially 
inconsistent with our understanding of the measures of these 
two traits, we have wi = Vi and roi == wl If in a given popu- 
lation the correlation between intelligence test scores and the 
scores on a general achievement test is 81, the value of the 
variance ratio is 81. 

When chronological age is the independent variable, it ap- 
pears reasonable to assume, at least in some cases, that it is 
contributed completely to the dependent variable. This means 
that m such cases the value of the variance ratio is For 
example, if the correlation between the scores on an achievement 
test and chronological age is .40, this variable contributes only 
.16 of the variance of the achievement scores. On the other 
hand, in the case of measures of teaching success and intelligence 
test scores, for which the correlation may be taken as 16, it 
may be that the uncorrelated factor of the intelligence test 
scores is large in comparison with the common factor. If this is 
the case, the value of the variance ratio would be relatively 
large, possibly as large as .60 This would mean that only a 

^ Tryon has developed a metliod for calculating the value of the variance 
ratio, but since it requires two additional variables 0:2 and xz defined so that aoi is 
the common factor for each pair of the four variables, the method is not generally 
applicable Tryon, R C “ The Interpretation of the Coefficient of Correlation,” 
Psychological R&siew, 36.423-24, September, 1929 

2 Kelley, T L. Interpretation of Educational Measurements Yonkers-on-Hud- 
son World Book Company, 1927, p 195 
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minor factor of general intelligence, as measured, contributes 
to teaching success but this factor produces three-fifths of the 
variance of the measures of teaching success ^ 

Effect of a factor of heterogeneity upon the relation between 
two variables. If a third variable is added to two paired var- 
iables,® the correlation between them will be affected. For 
example, suppose 

xq — a2 + as 
iCi == (22 “h <X4 

If ai IS added to both variables, the population is described as 
heterogeneous with reference to 0:2 = 21 and + ao 

will be different from roi In other words, the coefficient 
of correlation between two variables depends upon the hetero- 
geneity of the population with reference to other correlated 
variables The effect of a factor of heterogeneity may be illus- 
trated by considering the correlation between achievement test 
scores as measures of the dependent variable and intelhgence 
test scores as measures of the independent variable. For a 
population of pupils belonging to the same school grade, the 
value of ?'oi is frequently less than .50,^ but if the population is 
taken from a sequence of grade groups, the value of 7 01 for scores 
yielded by the same tests will be materially greater For a 
sequence of six grades, it may be as much as .80 or .90. 

The effect of a third variable (factor of heterogeneity) that is 
positively or negatively correlated with both Xo and Xi is to 

1 Since m the standard score equations wi — 1 00 and 1 00, 

factor patterns may be written for a given coefficient of correlation For example, 
m the above illustration we might vTite 

zo ~ V 64aoi + V 366o 
2 i =r V 04aoi + V 966i 
roi “ V 64 V 04 == 16 = 64 

The reader will find it illuminating to write the factor patterns corresponding 
to different assumptions in regard to the structure of two correlated variables 
- The simplified symbolism used here and in several of the following pages is 
introduced as a matter of convenience Nothing would be gained by expressing 
a variable as a weighted sum of its components The component variables 
<zi, « 2 , as, etc , are uncorrelated with each other 

3 The value of roi depends upon the tests administered 
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increase the ^‘magnitude” of the common component aoi. If the 
factor of heteiogeneity is positively correlated with one of the 
variables and negatively with the other, its effect is to decrease 
the correlation between the two variables. This case, however, 
IS not often encountered m educational research and usually the 
effect of heterogeneity is to cause the obtained coefficient to be 
larger than the one for the corresponding homogeneous popula- 
tion. 

A supplementary statement in regard to the meaning of the 
contribution of a causal variable. The preceding explanation 
of the effect of the factor of heterogeneity upon the common 
factor of a relationship emphasizes that a question concerning 
the contribution of a causal variable is indefinite until a popula- 
tion is specified The variance ratio for a heterogeneous popula- 
tion measures a composite of the contribution of the causal 
variable and the contribution from the factor or factors of 
heterogeneity. Hence, it tends to be misleading to speak of the 
contribution of a given causal variable in a population that is 
heterogeneous with reference to one or more factors The 
terminology may be used as a means of convenience, but an 
investigator should bear in mind the nature of the contribution 
when interpreting his findings 

The fact that the obtained measure of a contribution is for a 
particular population suggests the desirability of agreeing upon 
one or more standard populations. McCall, Kelley, and others 
have proposed that in dealing with certain problems of test 
construction an unselected group of twelve-year-old children 
be used as the standard population.^ The adoption of this 
population as standard would be helpful in dealing with some 
problems, but frequently we desire a measure of the ^^net’^ 
contribution of a causal variable, that is, a measure of its 
contribution in a population that is homogeneous with respect 

1 Kelley has presented data in support of the thesis that a population made up 
of equal unselected groups from six consecutive grades is approximately equal 
to that of an unselected age group 

Kelley, T, L. Interpretation of Educational Mea^remenis Yonkers-on-Hud- 
son. World Book Company, 1927, pp 197 f 
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to one or more other variables Since collecting data from a 
homogeneous population would frequently involve obvious 
difficulties, it IS appropriate to consider how the coefficient of 
correlation for a desired population may be estimated from the 
data collected from a heterogeneous population. 

C. Partial Correlation 

Estimating the coefficient of correlation for a population 
homogeneous with respect to related variables — ^partial correla- 
tion.^ The operation of the partial correlation techmque may be 
illustrated by referring to a study by Terman ^ The correlation 
between depth of chest and mental age for a large group of 
gifted boys ranging in age from 9 to 14 years is given as +.582. 
This population is heterogeneous with reference to chronological 
age Its correlations ^ with depth of chest and with mental age 
are given as + 618 and +.941 The partial correlation formula 
for three variables is 

_ y^Ol ~ ^^02^12 

viv^vr^ 

Let the symbols Xo represent depth of chest, Xi represent 
mental age, and X 2 represent chronological age. Substituting 
the given values 

roi 2 = + 002 

This value indicates that in a population ^ homogeneous with 
respect to chronological age there is no relationship between 
depth of chest and mental age. This conclusion is in agreement 
with a prion reasoning. 

May has derived a more general technique, which is applicable even when 
the factor of heterogeneity is sex, nationality, or some other unordered series 

May, M A “A Method for Correcting Coefficients of Correlations for Heter- 
ogeneity in the Data,” Journal of Educational Psychology, 20 417-23, Septem- 
ber, 1929 

2 Terman, L M , et al Genetic Studies of Genius, Vol I Stanford, California 
Stanford University Press, 1925, p 168 

^ Ibid , p 156 and p 168 

^ The age level of this homogeneous population is not specified The value 
+ 002 may be thought of as an “average” of the coefficients of correlation for 
the age levels included m the population from which the data were obtained 
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In the same source, the correlation between standing height 
and mental age for the same group of gifted boys is reported as 
.835; between standing height and chronological age as .845, and 
between mental age and chronological age as 941. Noting that 
with respect to the partial correlation formula roi = +.835, 
ro 2 = + 845, and ru = +.941, we have 

roi 2 = + 220 

The value +.220 mdicates a slight relationship between 
standing height and mental age for the group of gifted boys 
when the effect of the common cause, chronological age, has 
been ehmmated. One should not infer, however, that mental 
age IS a cause of standing height or that standing height is a 
cause of mental age. One important common cause has been 
eliminated, but it is possible that much of the correlation 
represented by the partial correlation coefficient +.220 is due to 
other common causes, for example, deep seated physiological 
factors which may affect both height and intellect. 

When the effects of two variables, and 0 : 3 , are to be par- 
tialed out, there are two formulae Either may be employed but 
by substituting the indicated values in both, a check upon the 
calculations is obtained 


Toi 23 = 


rpi 2 — ro 3 2^13 2 

Vl ^03 2'^1 ^13 2 


Tqi 32 — 


Tqi 3 — ^02.3^12 3 
Vl ~ r§2 3 Vl ~ rf2.3 


Ordinary coefficients of correlation are called zero-order co- 
efficients. When one vanable has been eliminated, as for ex- 
ample, roi. 3 , the coefficient is designated as one of the first- 
order. It will be observed that first-order coefficients are 
substituted in the above equations to obtain one of the second- 
order. When three or more variables are eliminated, the for- 
mulae may be written in an increasing number of ways to secure 
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checks. The general formula for partial correlation for the 
elimination of n ~ 1 independent variables ^ is as follows: 

„ roi 23 . (n — 1) — Tqu 23 . (n - l)^ln 23 ... (n - l) 

roi 23 . . . n == — , - -= - -■^= - 77- r — 

"Vl ■“ ^On 23 . (n-l)'vl 23 . (n - 1) 

When the number of variables to be partialed out is greater 
than one, the calculation is laborious, but as in the case of multi- 
ple regression described in the preceding chapter, a number of 
time saving techmques and aids have been devised. The tables 
by Holzmger ^ and Miner ^ provide the square roots required 
in the calculations. Holzinger ^ describes a method of cal- 
culating partial correlation coefficients in which determinants 
are used The devices by Hull ^ and Wood ® are also useful. 
Eaten has prepared tables for finding partial coefficients.'^ 

A precise definition of partial correlation. The partial correla- 
tion technique is commonly described as a means of eliminating 
the effect of one or more other variables (factors of hetero- 
geneity) from a relationship or as a means of securmg the coeffi- 
cient of correlation m a population for which the one or more 
other variables are constant. A more accurate description is 
that the coefficient of partial correlation is the correlation be- 
tween the residuals formed by subtracting the regressions of xq 
and Xi on the variable or variables partialed out.^ For exani- 


1 The reader should note that according to the symbolism used n, is the num- 
ber of independent variables The total number of variables is n + 1. 

2 Holzinger, K J Statistical Tables for Students in Education and Psychology. 
Chicago University of Chica go Pres s, 1925 74 pp 

® Miner, J R Tables of Vf — ra and 1 — for Use m Partial Correlation and 
in Trigonometry Baltimore Johns Hopkms University Press, 1922 50 pp 
^Holzinger, K J Statistical Methods for Students in Education Boston: 
Gmn and Company, 1928, pp 312-15 

5 Hull, CL “A Device for Determining Coefficients of Partial Correlation,” 
Psychological Review^ 28 377-83, September, 1921 

® Wood, E R “A Graphic Method of Obtaining the Partial-Correlation Co- 
efficients and the Partial Regression Coefficients of Three or More Variables,” 
Supplementary Educaiional Monographs^ No 37 Chicago University of Chicago 
Press, 1931 72 pp The charts described in this monograph are published by 
E R Wood, State Department of Education, Columbus, Ohio 
^ Baten, W D “Tables for Finding the Partial Coefficient of Correlation,” 
Journal of Experimental Education, 3 170-73, March, 1935 

8 For the proof of this statement, see Dunlap, J W , and Cureton, E E. “On 
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pie, roi.2 IS the coefScient of correlation between the residuals, 

a;o.2 = Xo — ro2 — X2 and X12 = Xi — ri2 — X2 When two var- 
0*2 

iables are partialed out, the residuals are Xo 2z — xo — 602 zX2 
— hs 2X3 and Xi 23 = Xi — 612 3X2 — bn 2^3 In general, roi 23 . . n 
is the correlation between X023 , n and Xi 23 . tj. 

Dependability of coefficients of partial correlation. The 
definition of the coefficient of partial correlation as the correla- 
tion between residuals affords a basis for considering the de- 
pendability of this statistic when it is labeled the coefficient of 
net correlation; that is, the coefficient of correlation between two 
variables when the effect of one or more other variable is elim- 
inated or when one or more other variables are held constant 
The usual interpretation of a coefficient of partial correlation 
implies that the factor pattern is of the type 

Xq ^ (2i “t” ^2 “f“ CIa 
iCl = Cli 4“ <22 “h 0>3 
X2 == ai 


The residuals are Xq. 2 = (ai + ^2 + cla) — ro2-^ai 

<72 

Xi 2 = (u-i + a2 + ctz) Ti 2 ai 

<72 

In this case, which may be described as the one in which the 
variable partialed out is a component of the other two, ro2 ^ and 

(72 

ri2 ” are equal to unity ^ and the residuals are formed by 

<72 

the Analysis of Causation,” Journal of Educational Psychology, 21 664-65, 
Pecember, 1930 

The concept of partial correlation as the correlation between residuals (errors 
of estimate) appears in Yule’s treatment of the topic, but subsequent writers do 
not appear to have carried the idea over to the interpretation of the results of 
partial correlation Yule, G. XJ An Introduction to the Theory of Statistics, 
Eighth Edition London Charles Griffin and Company, Ltd , 1927, pp 233 f 

1 This follows from the fact that when iC 2 is a component of zq, rja = ^ 

CTq 

This condition implies equivalent units In case the units are not equivalent, 
the values of the two expressions will need to be weighted to compensate for the 
inequality 
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subtracting xa from both xq and Xi. Hence, when the factor pat- 
tern IS of this type, partial correlation does yield the desired 
net correlation. 

If X2 IS not a component of xo and Xi, rns ” and ri2 ~ will not be 

cr2 U'2 

unity and the effect of X2 upon xo and Xi will not be wholly elim- 
inated. In addition, new factors of heterogeneity will be intro- 
duced Hence, the value of roi 2 will not be what is desired. It 
will be only a best estimate. 

The operation of partial correlation can be illustrated by 
using variables constructed from uncorrelated components such 
as may be obtained by counting tosses of coins. ^ For example, 
suppose 

Xo = Ai -f- A2 As “h A4 "P As 

Xi = Ai + A2 + Ae 

The coefficient of correlation roi = . 630 . If X2 = A 1, it is a 
component of Xo and Xi Application of the partial correlation 
techmque gives roi 2 = 500 , which is approximately the value 
of the coefficient of correlation ^ between Xo 2 = A2 + As 
-j- A4, “f" As and X1.2 = A2 Ag. If instead, X2 = Ai Hh A4 
+ A7, foi 2 = .55 which shows that the correlation obtained 
is not that for Xo and Xi in a population homogeneous with 
respect to X2. As another illustration, suppose 

Xo = Ai + A2 +• Ae 

Xi = Ai + A3 + A7 

If X2 = Ai IS partialed out, roi 2 = — 025 which is the coeffi- 
cient of correlation between A2 + Ae and As + A7. If instead, 

1 See page 367 for explanation of procedure In the illustration here and on 
the following pages N = 100 The number of coins tossed for the component 
variables were as follows Ai, 30 coins, J. 2 , 28 coins, As, 24 coins, A 4 , 20 coins, 
Ag, 16 coins, Ae, 16 coins, A?, 16 coins These calculations and others from similar 
data in the following pages are from Stmt, D B “Correlation Analysis as a 
Means of Solving Problems of Functional Relationship ” A thesis submitted 
for the degree of Ph D in Education Urbana University of Illinois, 1934 
109 pp 

2 The calculated value is 502 The slight discrepancy is due mainly to the 
fact that some of the intercorrelations between the component variables ob- 
tained from 100 tosses of coins are not precisely aero 
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^2 ^ Ai + A 4 + Ar is partialed out, roi .2 = .23. This is the co- 
efficient of correlation between [(Ai + A 2 + Ae) —• ro 2 ~(Ai 

<72 

4 - A4 + A?)] and [(Ai + A3 + Ar) — T12 ^(Ai + A4 -f- A7)]. 

<72 

Since neither ro 2 ^ nor ri 2 ~ is equal to 1 00, both residuals 

<72 <72 

contain a fractional multiple of Ai as a component Hence, 
the variables defined by these residuals are not perfectly homo- 
geneous with respect to Ai. In addition, the variable defined 
by the first difference has been made heterogeneous with re- 
spect to A 4 and A 7 and the variable defined by the second 
difference has been made heterogeneous with respect to A 4 . 
Hence, if the purpose was to partial out Ai, it has not been 
accomplished in a satisfactory manner. 

Types of factor patterns in educational research. There is no 
technique for determining the factor pattern of three mter- 
correlated variables, but our experience suggests certain types 
for certam cases. Chronological age appears to approximate a 
component of mental age and of a number of other variables ^ 
Hence, when chronological age is to be partialed out, we may 
usually assume a factor pattern of the type illustrated on 
page 380 It appears that few, if any, other variables may be 
regarded as components. Test scores always involve variable 
errors of measurement as a factor and they are likely to include 
one or more other factors that do not appear in the other two 
variables. Hence, the factor pattern m educational research is 
likely to be one of the more complex types 

1 Beyond certam limits the correlation may not be linear Dunlap and Cureton 
suggest that where this condition prevails, chronological age may be eliminated 
satisfactorily by pariialmg out as separate variables chronological age, chrono- 
logical age squared, chronological age cubed, and so on They state that it is 
usually unnecessary to go beyond the cubes, and m a great many cases not 
beyond the squares 

Dunlap, J W , and Cureton, EE *‘On the Analysis of Causation,” Journal 
of Educational Psychology, 21 663, December, 1930 Dunlap and Cureton refer 
in this connection to the work of Fisher on time series which are essentially 
similar to chronological age 

Fisher, R A Statistical Methods for Research Workers Edinburgh Oliver and 
Boyd, 1925, p 172. 
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Semi-partial correlation. By means of a modification of the 
technique described in the preceding pages, the partialing out 
process may be applied to only one of the two correlated varia- 
bles. By applying this modified technique, which is known as 
semi-partial correlation,^ we may obtain the correlation between 

(xq — ro 2 ™ X 2 ) and Xi The formula for accomplishing this is 

(T2 


T {0 2)1 ““ 


roi — ro2ri2 



As an illustration of the application of semi-partial correla- 
tion, Dunlap and Cureton cite the problem of investigating the 

1 For derivation and discussion of semi-partial correlation for any number of 
variables, see 

Dunlap, J W , and Cureton, E E. “On the Analysis of Causation,” Journal 
of Educational Psychology^ 21 663-72, December, 1930. 

From their general development, Dunlap and Cureton derive formulae for 
certain other systems of semi-partial correlation 

The original contribution of semi-partial correlation should be credited to 
Spearman, although his formula for three variables differs slightly from that of 
Dunlap and Cureton The present writers are grateful to Dr. B S Burks for 
calling their attention to this point See 

Spearman, C “The Proof and Measurement of Association between Two 
Things,” American Journal of Psychology, 15 94, 1904 

Franzen also proposed a formula for semi-partial correlation prior to the con- 
tribution of Dunlap and Cureton It is identical, except in symbols, with their 
formula for three variables See 

Franzen, Raymond “A Comment on Partial Correlation,” Journal of Edu- 
cational Psychology, 19 194-97, March, 1928 

A reader who is making an intensive study of correlation analysis should 
become familiar also with part correlation which was devised by B B Smith in 
collaboration with Mordecai Ezekiel The first account of this technique was 
published in “Correlation Theory and Method Applied to Agricultural Re- 
search,” mimeographed publication. Bureau of Agricultural Economics, United 
States Department of Agriculture, August, 1926, pp 57-60 Derivation and 
explanation are given m 

Ezekiel, Mordecai Methods of Correlation Analysis New York John Wiley 
and Sons, 1930, pp 181-84, 379-80 

The part correlation of xq with xi is the correlation between xi and the residual 
formed by subtracting the terms of the multiple regression equation, except that 
containing xi, from xq. Thus, for the case of two independent variables, the 
coefficient of part correlation is the correlation between (xq — 602 i ^ 2 ) and ici 
In the case of semi-partial correlation, the residual would be a;o ho 2 X 2 The 
coefficient of part correlation, 01 ^ 2 , represents the correlation between variable 
and the residue of variable xo after eliminating an estimate of the contribution 
of the part of variable x^ which is independent of xi The coefficient of semi- 
partial correlation, r(o 2 )i» represents the correlation between variable xi and 
the residue of variable xq after eliminating an estimate of the contribution of all 
of variable X 2 , 
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possibility of home environment contributing to the intelli- 
gence test scores of children ^ Obviously the intelligence of 
parents will contribute to home environment, the more intelli- 
gent parents providing the better environment. Hence, we 
shall expect a moderate degree of correlation between child 
mtelligence test scores and measures of home environment due to 
the operation of parental intelligence as a common cause Partial 
correlation is not satisfactory for dealing with this problem be- 
cause measures of parental intelligence would be partialed out 
from both child intelligence test scores and measures of home 
environment Assuming that parental intelligence as measured 
is a component of both the measures of environment and of the 
measures of child intelligence,^ the net correlation obtained by 
means of partial correlation would be between that part of child 
intelligence which is independent of parental intelligence and 
that part of environment which is independent of parental in- 
telligence We are interested, however, m the correlation be- 
tween all of child intelligence and that part of environment 
which is independent of parental intelligence. In other words, 
we desire the correlation between child intelligence and en- 
vironment m an environment that is homogeneous with refer- 
ence to parental intelligence Semi-partial correlation provides 
a technique for partialing out measures of parental intelligence 
from only the measures of environment. This technique, how- 
ever, will not be satisfactory unless the measures of parental 
intelligence are completely contributed to the measures of en- 
vironment 

Interpretation of coefficients of correlation when the relation- 
ship is not one of cause and effect.^ Although the variance 
ratio and its relation to the coefficient of correlation have been 

1 The reader who consults the reference by Dunlap and Cureton should note 
that their system of symbolism is not identical with that used in the formula 
given here. 

^ This assumption probably is not true 

®The interpretation of the coefficient of correlation as an index of probable 
errors of measurement was treated in Chapter VII and as an index of predictive 
efficiency in Chapter X See also the discussion of the coefficient of correlation 
in Chapter IV. 



CATiSE AND EFFECT RELATIONSHIPS 385 


introduced as techniques for the study of cause and effect rela- 
tionships, they are applicable in other situations. A test score 
may be thought of as the algebraic sum of the true score and 
the variable errors of measurement In terms of symbols 

Xi = + Cl 

A score on a duplicate form of the test will be represented by 
Xi = X^ + cj 


It may be assumed that (Xe^ = Hence, 


r 

^ ~ <rf 


+ (tI 


This relationship may be expressed by saying that the per cent 
of the variance of a group of test scores due to the trait measured 
by the test is given by the coefficient of reliability.^ It may also 
be stated that the per cent of the variance due to the variable 
errors of measurement is given by 1 — For example, when 
the coefficient of reliabihty is 90, then 90 per cent of the variance 
of the test scores is due to the trait measured by the test and 
10 per cent of it is due to the variable errors of measurement. 

When a new test is constructed, the correlation of the scores 
yielded by it with the criterion measures is calculated as an 
index of its validity. In the typical case, the ratio of the variance 
of the common factor to the variance of the scores yielded by the 
test IS not greatly less than the coefficient of validity For exam- 
ple, suppose a coefficient of validity of .80 is obtained for a given 
test Then approximately 80 per cent of the variance of the 
test scores is due to the common factor, i.e , to the factor of the 
criterion that the test measures. Less precisely, it may be said 
that on the average the test measures 80 per cent of what it is 
intended to measure. 

This interpretation of a coefficient of validity should not be 
made unless it is clearly understood. In the first place, it is 


1 It should be noted that the coefficient of reliability must be for the popula- 
tion being considered. 
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applicable only to the population represented by the data for 
which the coefficient of vahdity was calculated Furthermore, 
the statement that the average the test measures 80 per 
cent of what it is intended to measure’^ is appropriate only when 
the scores are expressed in deviation form When the test is 
designated to measure achievement, the interpretation is sub- 
ject to a further limitation. In most subject-matter fields, there 
is a fairly high correlation between the scores yielded by an 
achievement test and those yielded by an intelligence test 
This means that a factor of what we measure as achievement 
is also measured by an intelligence test A similar statement 
may be made with reference to criterion measures. Hence, it 
seems reasonable to say that a portion of the factor common to 
the achievement test scores and the criterion measures consists 
of general intelligence. If achievement is defined as the product 
of learning, that is, as “net achievement^^ which is implied 
when gains are computed,^ the interpretation suggested m the 
preceding paragraph is not appropriate. In other words, the 
suggested interpretation of a coefficient of validity is not appli- 
cable in the case of gains m achievement or when the meaning of 
“net achievement^^ is intended 
This point may be illustrated noting the probable nature of 
the factor pattern of the test scores and of the criterion measures 
Let ai represent the common factor contributed by intelligence, 
a 2 the remainder of the common factor, as the specific factor 
of the test scores, ga the remainder of the net achievement as 
defined by the criterion, and co and ei the respective variable 
errors of measurement. 

iCo = ai -f- a2 4“ as “h 6o 

iTi = ai + a2 + a4 + ex 

It is apparent that Voi would be greater than r^xo- a{)(xi- ai) 
which would be an index of the portion of the “net achievement 
measured by the test. Hence, the value of rox will exaggerate the 
validity of the test when it is thought of as measuring the prod- 


^ See page 303. 
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uct of learning. A more dependable index of the vahdity would 
be obtamed if the contribution of general intelhgence could be 
partialed out, but from our knowledge of the nature of intelli- 
gence tests, it is apparent that this cannot be done satisfactorily. 
Partialmg out general mtelligence may, however, mdicate the 
extent to which the calculated coefficient of vahdity should be 
discounted For example, Wood ^ reported a correlation of 605 
between the scores yielded by a true-false test and examina- 
tion grades. The correlations with intelligence test scores were 
.451 and 386. The coefficient of partial correlation is .523. This 
value is probably somewhat in excess of the actual correlation 
between the true-false test scores and the examination grades 
after the effect of general mtelligence is eliminated But it 
indicates that the reported coefficient of validity of 605 should 
be rather heavily discounted when an mterpretation is made 
relative to the community of function of the true-false test and 
the essay examination exclusive of general intelligence 
The hazards of applying partial correlation in such cases 
may be indicated by employmg correlations reported by Corey ^ 
He gives coefficients of correlation corrected for attenuation as 
follows: 


New Type Test and Essay Examination .93 
New Type Test and Army Alpha .62 

Essay Examination and Army Alpha .39 

When mtelligence is partialed out, the correlation between new- 
type test scores and essay examination grades is .95 which does 
not seem reasonable. 

The need for critical thinking m interpretmg partial correla- 
tion coefficients is apparent. The promiscuous calculation of 
coej05cients of partial correlation and the mechanical mterpreta- 
tion of them without considering the probable nature of the 

iWood, B. D Measurement %n H%gher Education Yonkers-on-Hudson 
World Book Company, 1923, p 188 

2 Corey, S M “The Correlation between New-Type and Essay Examination 
Scores and the Relationship between Them and Intelligence as Measured by 
Army Alpha,” School and Society, 32 849-50, December, 1930. 
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factor patterns involved have mislead many investigators. It 
should be emphasized that partial correlation accomplishes what 
it is commonly considered to accomplish only when the factor 
pattern is of the type illustrated on page 380. 

When dealing with data m the form of ratios such as IQ’s, 
achievement quotients, or per cents, it is necessary to recog- 
mze what is known as “spurious correlation,” If an intelligence 
test and an achievement test are administered to a group of 
children heterogeneous with reference to chronological age and 
the correlation between the obtamed scores is zero, which 
would probably be the case if the reliabilities of the test were 
zero, the correlation between the IQ’s and AQ’s would be .50. 
This correlation is not without meaning, but care must be exer- 
cised in interpretmg it or any coefficient of correlation obtained 
from ratios. The calculation of ratios introduces a common 
component or increases the one in the original measures.^ 

Partial correlation as a means of identifying cause and effect 
relationships. On page 371, it was emphasized that the existence 
of correlation between two sets of paired measures is not proof 
that one variable is a cause of the other. The relationship may 
be due to the operation of a common cause. When it is effective, 
partial correlation removes the contribution of a common cause 
from the common factor. This suggests that if all factors of 
heterogeneity were partialed out, the resulting coefficient of 
partial correlation would be evidence of a cause and effect 
relationship between the two original variables. Theoretically 
this argument is defensible, but in practice a coefficient of partial 
correlation should not be interpreted as evidence of a cause and 
effect relationship because this statistic is not dependable un- 

1 Yule, G XT An Irdrodudion to the Theory of Statistics London Charles 
Griffin and Co , Ltd , 1917, pp 214r-15 

Holzmger, K. J- '‘Formulas for the Correlation between Ratios,” Journal of 
Mducational Psychology, 14 344-46, September, 1923 

Thomson, G H , and Pmtner, Rudolph “Spurious Correlation and Relation- 
ship between Tests,” Journal of Educaiional Psychology, 15 433-44, October, 
1924. 

For additional references on this topic, see Walker, Helen M Studies in the 
History of Statistical Method Baltimore. Wilhams and Wilkins Company, 1929, 
p. 124. 
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less the variable partialed out is a component of the other two, 
and because it is not possible to determine when all factors of 
heterogeneity have been included It is exceedingly unfortunate 
that a number of writers have asserted or implied that partial 
correlation may be employed to identify cause and effect 
relationships.^ 

D. Regression Equations and Factor Analysis 

Contributions of two or more independent variables. A 
dependent variable may be thought of as the sum of the con- 
tributions of its causes. A number of persons have interpreted 
the coefficients of the multiple regression equation as measuring 
these contributions A case that has attracted considerable 
attention is that of the following equation reported by Burt.^ 

Binet = .54 School Work + 33 Intelligence + .11 Age 

In interpreting this equation Burt says that 

of the gross result (mental age score on Bmet test), then, one-ninth 
IS attributable to age, one-third to intellectual development, and 
over one-half to school attainment. School attainment is thus the 
preponderant contributor to the Bmet-Simon tests. To school the 
weight assigned is nearly double that of inteUigence alone, and dis- 
tinctly more than that of inteUigence and age combined. In deter- 
mining the child’s performance in the Binet-Simon Scale, intelligence 
can bestow but little more than half the share of the school, and age but 
one-third the share of intelligence. 

This statement has been criticized by Holzmger and Freeman.^ 
In the first place the regression equation does not prove the 


1 For example, see 

McCall, W A How to Experiment in Education New York The Macmillan, 
Company, 1923, p 239 

Reavis, G H “Factors Controlling Attendance m Rural Schools,’* Teachers 
CoUegey Columbia University Contributions to Education, No 108 New York* 
Bureau of Publications, Teachers College, Columbia University, 1920 69 pp. 

2 Burt, C Mental and Scholastic Tests. London P S King and Son, 1921, 
p. 183 

3 Holzmger, K. J., and Freeman, F N. “The Interpretation of Burt’s Regres- 
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existence of a cause and effect relationship between the variables 
of the equation. The regression equation may properly be 
thought of as representmg a means of predicting mental age as 
measured by the Binet test, but this concept is very different 
from thinking of the variables considered mdependent as repre- 
senting causes contributing to the dependent variable as an 
effect.^ However, if the assumption of a cause and effect rela- 
tionship is granted, Burt’s interpretation of the regression co- 
efficients IS stiU not Justified 

As an introduction to an appropriate mterpretation, consider 
the case in which xq = xi + X2, the mdependent variables being 
uncorrelated Since the variables are expressed as deviations 
from their respective means, 

_ 24 _ 2(a;i + X 2 y 
^0 N ~ N 

_ + 2x1X2 + xl) 

N 

_ Srrf 220:10:2 , So:| 

Since 0:1 and 0:2 are uncorrelated, 20:10:2 = 0 and we have 
cTo = (Ti + (Tg. Hence, the per cent of the variance of Xo con- 

sion Equation,” Journal of Educational Psychology, 16 577-82, December, 
1925 

The criticism by these writers is agreed with by Thomson His article and the 
rejoinder by Holzinger and Freeman are of interest in this connection 

Thomson, G H “The Interpretation of Burt’s Regression Equation,” 
Journal of Educational Psychology, 17 301-09, May, 1926 
Holzinger, K J , and Freeman, F N. “Rejoinder on Burt’s Regression Equa- 
tion,” Journal of Educational Psychology, 17 384-86, May, 1926 

1 The fact that a regression equation is not proof of the existence of a cause 
and effect relationship should be emphasized The use of the verb “ contribute” 
in referring to the relation between the variables of a regression equation sug- 
gests, if it does not definitely imply, a cause and effect relationship. Sometimes 
the writer is probably aware that the assumption of such relationship is not 
justified and does not intend to imply it Even when this is obviously the case, 
the uncritical reader is likely to be mislead For example, this is likely to happen 
in reading Garret’s discussion on pages 256-57 of his text, Statistics in Psychology 
and Education Unfortunately some writers appear to think of the regression 
equation as evidence of a cause and effect relationship Some illustrations are to 
be found in the studies of the factors associated with teaching success See 
pages 353-54 for references 
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tributed by Xi is given by the ratio -4 Similarly, -| gives the 

y 0 ^0 

per cent contributed by X2 The value of the first of these va- 
riance ratios is given by rh and that of the second by fog. 
This argument may be extended to any number of uncoi related 
component vanables. If the given mdependent variables do 
not completely account for the dependent variable, u is added 
to the right-hand member of the equation to designate the un- 
measured causes. 

In the typical relationship in education, the independent 
vanables are not components of the dependent variable and are 
usually correlated. In such cases the dependent variable cannot 
be precisely expressed as a linear function of the given independ- 
ent variable and the unmeasured causes. Hence, the equation 
formed is only an approximate expression of the relationship. 
In Chapter X, page 324 , the regression equation was in- 
troduced as the best estimate of a dependent variable that 
could be made from the given independent variables. For two 
independent variables 


Xo = &01 2^1 + &02 1 X 2 


The expression 601 2:^1 + &02 1^2 gives the best estimates ^ of 
Xo to be made from xi and X2 Hence it will not be equal to Xq. 
This condition makes it necessary to add u to represent the 
effect of using estimates of the contributions of xi and X2 plus 
any unmeasured causes. In the following development, u is as- 
sumed^ to be uncorrelated with the given independent variables. 


1 The introduction of the concept of a component variable, i e , one that is 
contributed completely to the dependent variable, provides a basis for a signif- 
icant comment upon the multiple regression equation and errors of estimate If 
the independent variables are uncorrelated components, the errors of estimate 
are due to the presence of one or more unmeasured causes, but when the inde- 
pendent variables are not components, this condition serves to increase the 
errors of estimate The errors of estimate in predictions are commonly explained 
to being due to unmeasured causes It seems likely that a considerable portion 
of the errors of estimate is due to variable errors of measurement and variable 
errors of validity in the mdependent variables See pages 358 f. 

2 This assumption is probably only approximated 
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2 - - ^(^01 23?! + &02 1^:2 + UY 

S(6Si aojf + SJo ia:i + 26 oi 2602 i2;ia;2 + + 26 oi 23 ;iii + 25 o 2 12:2^0 

N 

601 2^^! I ^02 iSiCl I 2601 2^02 12331172 t 

. 2601 I 25 o 2 1S372H 
-t- jj + 

Since u is assumed to be uncorrelated with either Xi or 
Ihxiu and HiX^u are zero. Hence the last two fractions in the 
above equation disappear The remaining product term may be 
transformed as follows 


2&01 2&02 l?fX^iX^ 

N 


— 2601 260: 


20:1X2 

N(Ti<r 2 


0 'l 0'2 


Hence we have 


= 2601 2&02 iTno'iO'^ 


^0 = &01 2^^! + &02 1<^2 + 2&01 2<ribo2 i(T2ri2 + crl 


As expressed by this equation, Xi makes two contributions to 
(To, the variance of xo The “besU^ estimate of the first is given 
by 601 2^v Call it the direct (individual) contribution. Similar 
statements may be made with reference to X2 The ^^best esti- 
mates of the other contributions of the two variables are com- 
bined in the product term which may be thought of as their 
joint (indirect) contribution. The remaining term crl represents 
the contributions of the unmeasured causes plus the attenuating 
effect of using the right-hand member of the regression equation 
in expressing the functional relationship. The per cents of the 
total variance may be obtained by dividing both sides of the 
equation by al 


1 = bl 


^0*0 


+ 6| 


02 i;_2 
^0 


^^ -h26oi 2^&o: 


^2, 

iF^- 

CTo 


12 


, ^ 


The first term of the right-hand member is the square of 
the corresponding beta coefficient ^ of the multiple regression 
equation and might be written g. Similarly, the third term 

^ See page 325 
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might be written 2 ri 2 i 8 oi 2^02-1- Several writers have written 
the equation as follows : 

1 = cZoi 2 + rfo2 1 + 2pQiPo2Ti2 + dou 
= doi 2 + rfo2 1 + dQi2 + dou 

The term doi 2 is read the coefficient of direct determination of 
Xi with respect to xq The symbol poi is read the path coefficient ^ 
connecting Xo and Xi. The term is read the coefficient of 
joint (indirect) determmation of Xi and X2 with respect to xo 
The values of the first three terms of the right-hand member of 
the above equations can be calculated from the data. The 
value of dou is obtamed by subtractmg the sum of these terms 
from 1 00 . 

This development, which may be extended to n independent 
variables, affords a means of interpreting the coefficients of the 
multiple regression equation in terms of the contributions of the 
independent variables to the variance of the dependent variable 
This interpretation is most conveniently made when the multiple 
regression equation is expressed in terms of beta coefficients or 
coefficients of determination For n independent variables the 
equation in terms of the latter would be 

1 The term “path coefficient” was proposed by Sewall Wright m developing a 
different technique for dealing with this problem, but it has been shown that 
path coefficients are merely the beta coefiicients of the regression equation and 

the expression boi 2 — is equal to poi Smce the regression equation is an estab- 

CTO 

hshed statistical procedure, the development given here seems preferable 
Wright’s work, however, is significant since it provides a basis for utilizing the 
regression equation in studying functional relationships 

Wright, Sewall “Correlation and Causation,” Journal of Agricultural Re- 
search, 20 557-85, January, 1921 

Wright, Sewall “The Theory of Path Coefficients,” Genetics, 8 238-55, 
May, 1923 

The reader who consults these references to Wright’s work will note that the 
present writers have made slight changes in his symbolism in order to provide a 
more consistent system 

For proof of the identity of path coefficients and beta coefficients, see 

Kelly, E L “The Relationship between the Techniques of Partial Correla- 
tion and Path Coefficients,” Journal of Educational Psychology, 20 119-24, 
February, 1929 

Dunlap, J W , and Cureton, E E “On the Analysis of Causation,” Journal 
of Educational Psychology, 21 673-75, December, 1930 
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1 == <^01 23 . . . n + c ?02 13 . , , n + . . , + ^On 12 . . (n - 1 ) 

+ ^0l2 34 . . 71 + . . . + . . . C ) . . . ( ) . . . n + 

In the next to the last term, ^ takes any value from 1 to n, and 
j takes any value from 1 to n other than There will be nC 2 
such terms, i e., terms for all possible pairs of the independent 
variables In this equation for the general case the coeflficients 
of jomt determination may be expressed m terms of the coeffi- 
cient of correlation between the two independent variables and 
the path coefficients connecting them with the dependent vari- 
able. For example, 

^012 34 ... n ~ I 2 P 0 I 23 . . . nP02 13 ... n 

The square of the coefficient of multiple correlation 
®o-i 23 ... n IS equal to the sum of the coeSicients of determina- 
tion involving the independent variables a;i, .T 2 , xz . , . Xn In 
other words, 133 . = 1 “ This relationship provides 

a means for checking the calculations made in computing the 
several coefficients of determination It should be noted, how- 
ever, that when Rl 123 . n is used as a measure of the total 
proportion of the variance of xo which is due to Xi, x^, Xz . 
it is subject to the limitation noted on page 392. Only, when 
Xi, x%, xz- n .Xn are uncorrelated components, does Rl-uz . . n 
measure the proportion of the variance of xo due to these factors 
with precision. Rim ,,, n is called the coefficient of multiple 
determination. 

Interpretation of regression coefficients. The preceding 
argument suggests a plan for interpreting coefficients of the 
multiple regression equation as measures of the contributions 
of the independent variables to the dependent variable of a 
causal relationship. If the regression equation is not expressed 
in terms of beta coefficients, the first step is to compute them by 
means of the relationship given on page 325. Second, compute 
the squares of the beta coefficients and the products of the type 

2ri2/3oi 23 7 i/3o2.13 n 

These results are coefficients of determination, and hence are 
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values of variance ratios. The interpretation of coefl&cients of 
determination is considered on page 397. 

Calculation of path (beta coefficients) — ^Wright’s method. 
Wnght^s method of calculating path (beta) coefficients is based 
on a theorem which may be stated as follows: 

Given the dependent variable, and Xi, Xz . . .Xn mde- 
pendent variables, the coefficient of correlation between Xq and 
any one of the mdependent variables or between any two of 
the independent variables, is equal to the path coefficient con- 
necting the two variables 'plus the sum of the products of the 
path coefficients along paths of indirect connection, paths 
through the dependent variable not being included.^ 

The application of this theorem may be explamed by employ- 
ing a diagrammatic representation suggested by Wright and 
used in a slightly modified form by Burks ^ and Heilman.® This 
plan of representation is shown for three mdependent variables 
in Figure 6 Each of the variables and the unknown (remaining 
component of the dependent variable) are represented by small 
rectangles, but the reader should bear in mind that neither the 
size of the rectangles nor their relative position has any signifi- 
cance. The lines connecting the rectangles merely indicate 
paths of influence. The reader should note that the line con- 
necting two independent variables indicates that the path co- 
efficient may be thought of with respect to either variable as 
the dependent one The value of the path coefficient is the same, 
but if it IS desired to recognize the direction of the assumed rela- 

1 For the proof of this theorem see the references to Wright’s work on 
page 393 

2 Burks, B S “The Relative Influence of Nature and Nurture upon Mental 
Development, a Comparative Study of Foster Parent-Foster Child Resem- 
blance and True Parent-True Child Resemblance,” The Twenty-Seoenth Yearbook 
of the National Society for the Study of Education, Part I Bloomington, Illinois 
^blic School Publishing Company, 1928, pp 299-301 

® Heilman, J D “The Relative Influence upon Educational Achievement of 
Some Hereditary and Environmental Factors,” The Twenty’^S&eenth Yearbook of 
the National Society for the Study of Education, Part II Bloomington, Illinois 
Public School Pubhshmg Company, 1928, pp 35-65 For a mote extended ac- 
count, see 

Heilman, J D “Factors Determining Achievement and Grade Location, 
Journal of Genetic Psychology, 36 435-56, September, 1929 



396 STUDY OF EDUCATIONAL PROBLEMS 


tionship, the subscnpt of the variable considered dependent 
may be written in first position 
For three independent variables as illustrated in Figure 6, 
the application of WrighUs theorem gives the following equa- 
tions.^ 


roi = poi + P12P02 + P31P03 + Pi 2 p 32 po 3 + P13P23P02 

ro2 = P02 + P12P01 + PZ2P0Z + PziPizPio + pi2Pzipoz 

^*03 = P03 + PlzPoi + P2zPo2 + PizP2iPq2 + ^23^12^01 

ri2 = Pl2 + Pz2PlZ 

Tu == Pl3 + P 2 ZP 12 

T2Z = P2Z + P1ZP21 

Usually, we are concerned only with the values of the path 
coefficients of the paths leading to the dependent variable since 



Fig 6. Path coefficient diagram for three independent variables poi, 
Pq2j and poz are also the beta coefficients jSoi 23, ^02 is, and /Sos 12 

these are the only ones used m calculating the coefficients of 
determination of the types doi 234 . and doll 34 . . n. When the 
equivalents of ri 2 , and r 23 are substituted in the equations 
for roi, ro 2 , and ros, they simplify ^ to 

^The path coefficients are written with simplified subscripts The more 
elaborate subscripts are useful for indicating the connection with regression 
coefficients The subscripts of coefficients of determination may also be simpli- 
fied »» 

2 Examination of the equations will indicate how they may be extended to any 
number of independent variables Furthermore, they may be written without 
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Tqx = Pqi + rnpo2 + rupoz 
7*02 = Po2 + T%zPqz + ^12p01 
r03 = P03 + TizPqi + r23P02 

These simultaneous equations may be solved by further sub- 
stitution, but when there are several independent variables, 
it is advisable to employ the methods developed for calculating 
multiple regression equations.’- After the path (beta) coefficients 
have been obtained, the calculation of the coefficients of de- 
termination is a simple matter. 

Interpretation of coefficients of determination. A coefficient 
of determination represents the value of a variance ratio and 
should be interpreted as such. A coefficient of direct determina- 
tion gives a measure of only the direct contributions from an 
mdependent variable. A share of one or more of the joint con- 
tributions must be added. Wright did not develop a technique 
for dividing the joint contributions Heilman divided them in 
proportion to the direct contributions of the two independent 
vanables involved. In the absence of a better procedure, this 
method may be employed, but the total coefficients of deter- 
mination thus obtained should be regarded as only approxima- 
tions. 

After the total coefficients of determination have been ob- 
tained, our Ignorance concerning what our measures actually 
represent creates a difficulty. Suppose the independent varia- 
bles are scores on an intelligence test and scores on a silent read- 
ing test and that the total coefficients of determination are 
found to be .30 and .40 To say that what is measured by the 
intelligence test contributes 30 per cent of the variance of the 
dependent variable is not satisfactory because we do not know 
what the intelligence test measures Obviously a portion of the 
complex of abilities it measures is also measured by the silent 
reading test More meaningful findings would be coefficients of 
determination for the factor of the intelligence test scores that 

constructing the representation illustrated in Figure 6 They are indentical, 
except for symbolism, with the normal equations*’ given on page 332 

1 See pages 330-32 The method developed by Griffin is economical 
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is uncorrelated with the silent reading test scores, the factor 
common to the two variables, and the factor of the silent read- 
ing test scores that is uncorrelated with the intelligence test 
scores In explaining the general problem for which he developed 
the path coefficient technique, Wright refers to such causes 
which he designates as “remote^' but he does not give any tech- 
mque for identifying and measuring their contributions. Until 
we are able to identify the remote (elemental) causes and to 
secure measures of their contributions, an investigator is re- 
stricted m interpreting coefl&cients of determination derived 
from the coefficients of a multiple regression equation. 

Dependability of coefficients of determination derived from 
the coefficients of the multiple regression equation. In deriving 
the equation 


1 = doi 2 + do2 1 + doi2 + dow 

the multiple regression equation was used (See page 391 ) 
It gives only the best estimates of Xq to be obtained from Xi and 
X 2 . This condition operates to reduce the calculated value of 
the coefficients of determination What happens may be 
illustrated by using a set of variables built up from counts of 
coin tosses as follows : 


“ Ai "P A.2 “f" A.$ “b A4. “h Aq 

Xi = + A2 Ae 

X2 = Ai + Az + A7 

X3 = Ai + A4 4 " As 

The set-up represented by these variables is somewhat similar to 
that which we have when Xo represents measures of achieve- 
ment and Zi, Z 2 , and Z 3 are measures of abilities that con- 
tribute to this achievement The component Ai may be thought 
of as corresponding to Spearman's “g" factor. ^ The compo- 
nents Agj A 7 , and As may be thought of as representing variable 
errors of measurement and vahdity By employing the path 


1 See pages 402-03 
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coefficient technique, the following values were obtained for the 
coefficients of determination: 

= .1901 
^^02 = .1116 
<io3 “ 0400 
dm= 1044 

doT^ — 0682 

do23 ~ .0536 
Total 5679 

Subtracting this total from 1 00, we have dou = .4321. By the 
definition of Xo, u = A 5 . Direct calculation gives doA^ = .100 
This means that the calculated values for doi, do2, etc , are too 
small Their total should be .90 mstead of .5679. The attenua- 
tion of the calculated values is due to the use of ^^best^^ esti- 
mates in forming the equation of relationship, which, in this 
case would be 

Xq = &01 23X1 + 602 13X2 + 603 12X3 + U 

This condition should be kept in mind when the coefficients of 
determination are calculated. 

Concept of elemental causes. On pages 397-98 the point was 
made that coefficients of determination for uncorrelated causes 
would be more meaningful statistics. Such causes may be 
described as elemental with respect to the group of independent 
variables considered ^ Let ai, ^3, a4, <25, and represent 
causes contributing to aro which are elemental with respect to 
the given independent variables Also, let So represent the sura 
of all contributing factors that are unrelated to the given inde- 
pendent variables and let eo represent the chance factor included 
in the dependent variable Assuming that xq is a linear func- 
tion of its causes, it may be expressed as 

Xq = CoiCti -j- CQ 2 CI 2 “h CozCld + Cq4(14 "t" "h "i“ Cos^O 

^ They are also elemental with respect to xq here designated as the dependent 
variable When elemental causes are being considered any variable of the group 
might be designated as dependent 
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The independent vaiiables may also be expressed as weighted 
sums For example, they might be delSned as follows* 

a;i = Ciiai + Cuaz + Ci 4 a 4 + C17S1 + Cmei 

X 2 = C22Ct2 4“ 023<^3 “h ^24^4 “h 025^5 + ^27^2 “h 028^2 

Xz = 031^1 + 033^3 + CzzCis + C 37 S 3 + ^38^3 

X 4 = C4iai 4" 043^^3 "h C44a4 + e46<26 H~ C 47 S 4 “h 048^4 

^5 = C 52 CI 2 + 053a3 + c^zae + + Css^s 

This analytical pattern is probably typical of those encountered 
in educational research. However, one cannot assume the ab- 
sence of an elemental component in any of the independent 
variables. Hence, the blank spaces must be assumed to be filled 
until there is convincing evidence that certain c^s (factor load- 
ings) are zero 

If additional independent variables are introduced into a 
given situation, the number of a’s may be increased, but it is 
reasonable to assume that eventually all possible a’s will be 
represented m the pattern In such a situation, the a's may be 
thought of as elemental causes in an absolute sense It should 
be noted that the relationship between the dependent variable 
and the elemental causes is always one of cause and effect. 
Hence, in studying the contributions from elemental causes, it 
is not necessary to demonstrate that the independent variables 
are causally related to the dependent variable 
Measurement of the contributions from elemental causes. 
The contribution of ai to Xq is given by the variance ratio 

a 

— ^ and the contribution of the other a’s by corresponding 
^0 

variance ratios. If the a's, Sq, and Cq, are expressed in terms of 
standard units, <tI, = cr% == <tI^ . , , = a% = (t% = 1. Further- 
more, the c^s may be chosen so that = 1 Hence, the vanance 
ratios reduce to squares of the c’s and we have 

1 = 4\+ 4% + 4i + Co6 + + 4^ + C^s 

The problem of measuring the contributions of the elemental 
components to the dependent variable is one dete rmining the 
squares of the c’s m the equation of the dependent variable. 
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Similar equations may be written for each of the independent 
variables. A reliability coefScient is equal to 1.00 minus the 
square of the factor loading of the corresponding chance factor. 
For example, 


roo = 1 00 els 

The correlation between two is expressible in terms of 
For example, in the situation illustrated here 

roi = CoiCii + C02C12 + C03C13 + C04O14 + ^ 06^15 + C06C16 

When the number of equations thus formed is equal to the 
number of c^s, it is theoretically possible to determine their 
values. However, one does not know in advance how many c’s 
are involved and hence the determination of their values cannot 
be accomplished by writing and solving a set of simultaneous 
quadratic equations The problem is that of obtaining a unique 
determination of the values of the c’s from the mtercorrelations. 
Kelley ^ has proposed a method of successive approximations 
based upon least squares, but Holzinger ® has shown that 
several different solutions may be fitted to Kelley^s data Re- 
cently Thurstone ^ has developed a technique which gives, in 
the case of certain factor patterns, a unique determination of 
the c^s (factor loadings) for the factors common to two or more 

1 Provided the and their factors are in terms of standard units 

2 Kelley, T L Crossroads zn the Mind of Man Stanford Umversity, Cali- 
fornia Stanford University Press, 1928, pp 122 f 

Another reference dealing with the same problem is Hotelling, Harold 
'‘Analysis of a Complex of Statistical Variables into Principal Components,^’ 
Journal of Educational Psychology, 24 417-41, 498-520, September, October, 

1933 

^Holzmger, Karl J, and Swineford, Frances “Uniqueness of Factor Pat- 
terns,” Journal of Educational Psychology, 23 247-58, March, 1932 

^ Thurstone, L L. The Theory of Multiple Factors Ann Arbor, Michigan 
Edwards Brothers, Inc , June, 1932 65 pp 

Thurstone, L L A Simplified Multiple Factor Method and an Outline of the 
Computations Ann Arbor, Michigan Edwards Brothers, 1933 26 pp 

Thurstone, L L “Vectors of Mind,” Psychological Review, 41 1-32, January, 

1934 

Thurstone, L L “Unitary Traits,” Journal of General Psychology, 11 126- 
32. July, 1934 

A more comprehensive reference is Thurstone, L L The Vectors of Mind 
Chicago, University of Chicago Press, 1935 266 pp 
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of the variables The c for the specific factor of a variable may 
be obtained by subtracting from unity the sum of the squares 
of the c^s of the common factors included m the variable and 
the square of the factor loading of the chance factor obtained 
by means of the coeflSicient of reliability 
Determining the presence of a component common to four or 
more variables. The question of the conditions under which a 
group of intercorrelated variables may be considered to involve 
a common component attracted the attention of Spearman 
thirty years ago ^ He noted that m some cases the coefficients 
of mtercorrelation between variables tended to fit a mathemati- 
cal formula, called the tetrad equation^ which in the case of four 
variables may be written 

TlZ Tzi 

This may be changed to the form 

^12^34 - ri3r24 = 0 

The left-hand member of the equation, designated by the 
symbol ^1234, is called a tetrad difference. For four variables, 
there are three tetrad differences: 


<^1234 == ri2f34 Tizru 
tu^Z = ^12^34 — ri4r23 

^1342 = r 12 X 24 , rum 

Spearman ^ formulated the theorem that when the tetrad 

^spearman, C “General Intelligence Objectively Determined and 
Measured,” American Journal of Psychology, 15 201-92, 1904 
2 For the first formulation, see Spearman, op cit The formulation given m 
his The Abilities of Man, pp 74-75 is essentially the same as the earlier one 
See also Spearman, C “What the Theory of Factors Is NotX Journal of Educa- 
tional Psychology, 22 112 17, February, 1931 

The reader who is interested m the mathematical proof of the theorem should 
consult the Appendix of Spearman’s The Abilities of Man or Line, W , and 
Hedman, H B. “A Simplified Statement of the Two-Factor Theory,” Journal 
of Educational Psychology, 24 195-220, March, 1933 This reference gives a 
bibliography for a more extensive study A bibliography up to 1928 is given by 
Walker, H M Studies in Statistical Method Baltimore* The Williams and 
Wilkms Company, 1929, Chapter VI For a number of theorems, including this 
one, see Kelley, T L Crossroads in the Mind of Man Stanford University, 
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differences for a group of vanables are equal to zero, we have a 
necessary and suflficient condition for the conclusion that the 
several abihties represented by the measures include a common 
factor and that each abihty is made up of this common factor 
(called 'V”) a specific factor ^ (called “s’O- 

When a group of variables have a common component and 
no group factors, the tetrad differences calculated from an actual 
table of correlation coeflScients are hkely not to be precisely 
zero, especially if the number of cases is not large. ^ Any con- 
dition that influences the value of the calculated coefficient of 
correlation ^ is likely to influence the tetrad differences in which 
it appears. If the data form a random sample, and the investi- 
gator desires to generalize to the larger population or universe, 
the effect of chance must be considered Hence, the question 
arises m regard to the deviation of the tetrad differences from 
zero that may exist when the variables actually consist of only 
a general factor and a specific factor The probable effect of 
chance may be calculated ^ as in the case of other statistics. 

California Stanford University Press, 1928, Cbapter III Proposition 10 is 
Spearman’s theorem 

1 A specific (unique) factor is uncorrelated with the common factor and with 
the specific factors in the other variables 

2 Spearman, C “Disturbers of Tetrad Differences* Scales,” Journal of 
Educational Psychology, 21 559-73, November, 1930 

For illustrations of distributions of tetrad differences, see Rogers, K H 
“ ‘Intelligence’ and ‘Perseveration’ Related to School Achievement,” Journal 
of Experimental Education, 2 35-43, September, 1933 

Lme, W, and Kaplan, E “Variation in I Q , at the Preschool Level,” 
Journal of Experimental Education, 2 95-100, December, 1933 

Rogers, K H “Perseveration m a Group of Subnormal Children,” Journal 
of Experimental Education, 2 301-09, March, 1934 

3 See pages 101 f , 151, and 154. 

^ Several formulae have been developed 

Spearman, C The Abilities of Man New York The Macmillan Company, 
1927, Appendix, p xi (Formula 16a) The proof of this formula has been 
promised but not published 

Kelley, T L Crossroads in the Mind of Man Stanford University, California 
Stanford University Press, 1928, p 49. 

Moul, M , and Pearson, K “The Mathematics of Intelligence I The 
Sampling Errors m the Theory of a Generalized Factor,” Biometrika, 19. 246-91, 
1927 

Wishart, John “Sampling Errors m the Theory of Two Factors,’* British 
Journal of Psychology, 19 181-87, June, 1928 

An empirical test of the formulae has been reported by Garrett, H E “ The 
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Thurstone has stated as a general theorem that the necessary 
number of uncorrelated (orthogonal) components (factors) m a 
group of variables is equal to the ^^rank’^ of the table of their 
intercorrelations considered as a determinant ^ The ^^rank” of 
a determinant is the order of the highest order of minors, all of 
which are not equal to zero ” ^ This means that if all of the 
fourth-order minors are equal to zero but all of the third-order 
minors are not equal to zero, then the table of intercorrelations 
may be explained by the presence of three uncorrelated compo- 
nents. Since the tetrad differences are merely the second-order 
minors, Spearman's theorem is a special case of Thurstone's 
theorem.^ 

Contributions of factors common to three or more independent 
variables. When it is apparent that a group of variables in- 
cludes a common factor and no group factor, the correlation of 
the common factor with any one of the variables may be calcu- 
lated by the formula ^ 


- A' 

T ag 

yJT - 2A 

Sampling Distribution of the Tetrad Equation,” Journal of Educational Psy- 
chology, 24 536-42, October, 1933 

1 Thurstone, L L The Theory of Multiple Factors Ann Arbor, Michigan 
Edwards Brothers, Inc , 1932, p 20 

2 “Equal to zero” is to be interpreted in the sense that a group of tetrad 
equations are considered equal to zero 

®The reader interested in factor analysis will find the following references 
helpful 

Chant, S N F “Multiple Factor Analysis and Psychological Concepts,” 
Journal of Educational Psychology, *26 263-72, April, 1935 

Russell, O R “Some Observations on Multiple-Factor Analysis,” Journal 
of Educational Psychology, 26* 284-85, April, 1935 

Thomson, G H. “The Definition and Measurement of ''g” (General Intel- 
ligence),” Journal of Educational Psychology, 26 241-62, April, 1935 

* Spearman, C The Abilities of Man New York The Macmillan Company, 
1927, Appendix, p xvi 

For applications of this formula, see Holzmger, K J “ Thorndike’s C A V D. 
Is Full of g,” Journal of Educational Psychology, 22, 161-66, March, 1931 

Cairns, G. J “An Analytical Study of Mathematical Abilities,” The Catholic 
University of America, Educational Research Monographs, Vol 6, No 3 Wash- 
ington, D C * The Cathohc University Press, 1931 104 pp 

Cairns applied the tetrad technique and certain supplementary procedures to 
data obtained by administering eighteen tests 
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A is the sum of the intercorrelations between Test a and every 
other test of the group, A' is the sum of the squares of these 
correlations, and T is the total of all intercorrelations The 
coefficient 7'ag is a measure of the extent to which the scores of 
a test are saturated’' with “gr" and has been called the “co- 
efficient of saturation." These correlations are equivalent ^ to 
the factor loadings of the first factor resulting from analysis of 
a correlation matrix by means of the “centroid" method de- 
veloped by Thurstone ^ Continuation of this analysis results in 
the determination of the factor loadings ® of any other common 
factors which may be present among the variables, but the 
factor ioadmgs thus secured must be subjected to further mathe- 
matical treatment if a “umque" solution is to be obtained. 

Applications of correlation analysis. The techniques de- 
scribed m the preceding pages have a number of applications. 
Multiple regiession equations wuth beta coefficients are used to 
determme the appropriate weights to be assigned to the sub- 
tests of a battery in computing the total score ^ Since the beta 
coefficients are equal to the path coefficients connecting the sub- 
tests represented by Xi, A ^2 . . . Xn to the criterion Xo, the 
coefficients of determination for the independent (direct) con- 
tributions to the criterion and for the joint contributions of the 
pairs of the sub-tests can easily be obtained The sum of the 
coefficients of determination for a given group of such tests 
represents the statistical validity ^ of the battery as determined 

1 If all of the tetrads vanish, the two methods give identical results 

2 Thurstone, L L ^ Simplified Multiple Factor Method and an Outline of the 
Computations Ann Arbor, ]\Iichigan Edwards Brothers, 1933 26 pp 

® Factor loadings are the c’s previously referred to They are the correlations 
of the variables with the elemental or primary factors "V^en these factors are 
orthogonal, or uncorrelated, the factor loadings are the beta coefficients of 
standard score regression equations m which the elemental, or primary, factors 
are the independent variables The squares of the factor loadings measure the 
proportions of the variance of a given variable due to the primary factors 

^ See Kelley, T L Interpretation of Educational Measurements Yonkers- 
on-Hudson World Book Company, 1927, pp 212-13 

® For definitions of reliability and validity in terms of variance, see 

Cureton, E E “Errors of Measurement and Correlation,” Archives of Psy- 
chology, No 125, 1931, pp 8-13 

Cureton, E E “Validation against a Falhble Criterion,” Journal of Ex- 
perimental Education, 1 258-63, March, 1933 
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by the criterion and the separate coefficients of determination 
will picture the contributions of the several sub-tests. A rel- 
atively high value of a coefficient of determination of the type 
doi 23 . . « indicates a sub-test that contributes significantly 
to the measurement of the desired ability or trait. A relatively 
low coefficient of determination of the type doi 2 34 ... ^ indicates 
relatively little overlapping m function by the sub-tests repre- 
sented by Xi and X 2 . 

Factor analysis appears to offer a means of isolating and 
identifying the human traits and abihties that we wish to 
measure. It is not unreasonable that eventually a group of 
elemental abilities and traits may be identified and described 
which will bear somewhat the same relation to achievement in 
the various fields that chemical elements bear to chemical com- 
pounds Such determinations will contribute materially to the 
construction of educational tests Factor analysis also appears 
to have possibilities m connection with problems having to do 
with cause and effect relationships 

Variance ratios, factor loadings, and some of the other terms 
introduced are relatively new and an investigator employing 
correlation analysis should make certain that he understands 
the interpretation of the results yielded by these techniques. 
On page 372 it was pointed out that the contribution from a cause 
is to the vanabihty (variance) and not to the raw measures of 
the variable. This means that the results obtained by means of 
correlation analysis are for a population of certain specifications. 
Hence, it is important that the collection of data be wisely 
planned. Otherwise the results obtained may not be for the 
population in which one is interested. 

ILLUSTRATIONS OF THE APPLICATION OF THE TECHNIQUES 
OF CORRELATION ANALYSIS 

The references of this list were selected to illustrate the application of the 
techniques of correlation analysis The reader, however, should not con- 
sider them model studies In some cases the technique employed probably 
failed to accomplish what was desired 
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Adams, H F '‘A Non-Intellectual General Factor,” Journal of Educational 
Psychology j 23. 173-78, March, 1932 
This study is presented by its author as evidence that tetrad technique is 
applicable to correlations relating to other phenomena than mental abilities 
The data used were ^‘scoies made by the most outstanding amateur and 
professional golfers in the three mapr tournaments held in this country 
during the past eleven years ” Length of holes, as a com m on factor, is 
shown to be similar to Spearman’s “g ” 

Baldwin, B T “The Relation between Mental and Physical Growth,” 
Journal of Educational Psychology, 13 193-203, April, 1922 
Zero order coefficients are reported for physical and mental traits of 
49 girls and partial correlation was employed to eliminate chronological 
age A range of several years was elimmated by this technique, but the 
regression of height on age appears markedly curvilinear as shown by 
“Growth Curves m Height ” On the other hand, for the age range studied, 
the mental growth curves are apparently linear 

Bukks, B S “The Relative Influence of Nature and Nurture upon Mental 
Development; a Comparative Study of Foster Parent-Foster Child 
Resemblance and True Parent-True Child Resemblance,” The Twenty- 
Seventh Yearbook of the National Society for the Study of Education, 
Part I Bloomington, Ilhnois Pubhc School Pubhshmg Company, 
1928, pp 219-316 

On pages 299-304 is reported a path coefficient study in which child 
intelligence is the dependent variable and measures of parental intelligence 
and of environment are the independent variables In place of rationing 
off the joint influence of the independent variables according to the weights 
of the direct mfluences, Burks computed a coefficient of part determination 
in an effort to show the total contribution of parental intelligence to the 
variance of child intelligence, where only those aspects of environment are 
held constant which are independent of parental intelligence. 

Caiens, G J. “An Analytical Study of Mathematical Abilities,” The 
Catholic University of America, Educational Research Monographs, 
Vol 6, No 3 Washmgton The Cathohc Umversity Press, 1931. 
104 pp. 

The tetrad technique was applied in this study to intercorrelations 
between scores on eighteen tests In an effort to locate group factors, 
was elimmated from these intercorrelations Where the net coefficients 
appeared of significant magnitude, the reference variable technique was 
applied on the hypothesis that tetrad differences significantly different from 
zero would locate a group factor in the two variables other than the reference 
ones 
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CoLLiNSj J. E. ‘‘The Intelligence of School Children and Paternal Occupa- 
tion/' Journal of Educational Research, 17 157-69, March, 1928 
The problem of this study was to determine the degree of relationship 
between intelligence of children and the occupational status of the father 
No coefficients of correlation are reported, but the nature of the lelationship 
IS probably more adequately indicated in tables and graphs For example, 
the interquartile ranges of intelhgence quotients of gioups of childien, 
classified according to the followmg occupational groups agriculture, un- 
skilled labor, skilled labor, foreman, trade, clerical, manageiial, and profes- 
sional, indicates a concomitant increase of intelligence with variation of 
occupation from unskilled labor to the managerial type 

Denworth, K M “The Effect of Length of School Attendance upon 
Mental and Educational Ages,” The Twenty-Seventh Yearbook of the 
National Society for the Study of EducaUon, Part II Bloomington, 
Illinois Public School Publishing Company, 1928, pp 67-91 
CoefEcients of partial correlation and standard score regression equations 
are reported m this study The author recognizes the limitations of the 
partial correlation technique and restricts her interpretation of the regres- 
sion equations to then piognostic significance 

Herriott, M. E “Attitudes as Factois of Scholastic Success,” University 
of Illinois Bulletin, Vol 27, No 2, Bureau of Educational Research 
Bulletin, No 27 Urbana University of Illinois, 1929 72 pp 
The groups studied consisted of 260 students of educational psychology 
and 113 students of technique of teaching The variables measured included 
intelhgence, reading ability, study habits and attitudes The paitial 
correlation technique was employed up to the calculation of tenth-ordev 
partials 

Holzingeb, K J. “Thorndike’s C A V D. Is FuU of g,” Journal of Educa-- 
tional Psychology, 22 161-66, March, 1931 
In this study, the tetrad technique is applied to mtercorrelations between 
C A V.D , Otis Self-Administering Test, Terman Group Test, and Stanford 
Bmet The correlations of each with g are reported respectively 960, 921, 
.960, and .817 Hence, C A V D. is “full of g.” 

EjELiiBy, T L Crossroads in the Mind of Man, Stanford University, 
California Stanford University Press, 1928 238 pp. 

In this comprehensive study of factor theories, KeUey presents his 
criticisms of Spearman’s work, states a number of propositions with respect 
to general and specific factors in terms of varying numbers of variables, 
develops techniques and applies them to data collected from kmdergarten, 
third-grade, and seventh-grade children 
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Kelley, T L The Influence of Nurture upon Native Differences New 
York The Macmillan Company, 1926 49 pp. 

After carefully dej&nmg the terms “ nature and nurture,'^ Kelley 
develops certain techmques for use in differentiating between nature and 
nurture factors and m abstracting the nurture facto He applies his 
techniques to data obtamed from eight-, eleven-, and fourteen-year-old 
children 

Line, W , Hogers, K H , and Kaplan, E “Factor Analysis Techniques 
Applied to Public-School Problems,” Journal of Educational Psy- 
chology, 25 58-65, January, 1934 

This study mcludes applications of both Spearman^s and Thurstone’s 
techniques It is stated that the latter “Made possible a more exhaustive 
factorial analysis, although the significance of the subsidiary factors cannot 
yet be stated ” 

Nanninga, S. P. “Costs and Offerings of California High Schools m Bela- 
tion to Size,” Journal of Educational Research, 24 356-64, December, 
1931. 

In this study curvihnear relationships were discovered which necessitated 
the calculation of ratios of correlation and the use of curve-fittmg by the 
method of least squares 

Slocombe, C S “Of Mental Testing — A Pragmatic Theory,” Journal of 
Educational Psychology, 19: 1-24, January, 1928. 

An excellent discussion of the two-factor theory. Worthy of careful 
study by the interested student 

Snedecor, G W. “Calculation and Interpretation of Analysis of Variance 
and Covariance,” Monograph Number One, Division of Industrial 
Science, Iowa State College Ames, Iowa Collegiate Press, Inc , 1934 
96 pp. 

This reference deals with the techniques to be employed in analyzing the 
total variance of a set of measurements to show the contributions of various 
factors of classification, or of heterogeneity These techmques represent an 
extension of the work of R A Fisher (See page 252 ) While illustrations of 
the techniques are given from the fields of agriculture and biology, they 
are worthy of application to educational problems For example, the 
problem of determining whether freshmen classes over a period of years 
represent a homogeneous population with respect to intelligence, or other 
traits, may be attacked through the use of techniques described here 
Another problem, for which the techniques are applicable, is that of deter- 
mining the extent to which the correlation between certain traits is an 
internal characteristic of several groups, or an mter-group relationship 
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Speaeman, C E The Abilities of Man New York The Macmillan Com- 
pany, 1927 415 pp 

In this comprehensive text, Spearman discusses several theories, includ- 
ing his own, concerning the nature of mtelligence and othei human abilities 
The lesearch of over twenty yeais on the two-factor theory is summarized 
in this volume. The appendix is an excellent source of information with 
respect to the statistical techniques employed 

Symonds, P M ‘‘The Effect of Attendance at Chmese Language Schools 
on Ability with the Enghsh Language,'^ Journal of Applied Psychology , 
8 411-23, December, 1924 

Bi-senal coefficients of correlation were calculated m this study in an 
effort to show the relationship between attendance and non-attendance at 
Chinese language schools (m Hawaii) and ability m Enghsh quantitatively 
measured Age and intelhgence ivere held constant by means of the partial 
correlation technique The apphcation of the bi-senal r techmque seems 
appropriate A continuous variable may be assumed to underly attendance 
and non-attendance at a Chinese language school Elimination of the 
variable clironological age by partial con elation likewise seems appropriate, 
but it is probable that mtelligence “as measured” does not contribute com- 
pletely Furthermore, the application of paitial correlation to coefficients, 
some of which are bi-serial coefficients, seems questionable The bi-senal r is 
only approximately eqmvalent to the Pearson product-moment r. 
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DETERMINING WHAT SHOULD BE 

General character of the problem. The problems dealt with 
in the preceding chapters may be classified under three heads* 
(1) What has been? (2) What is? (3) What will be'!' We 
may also ask w'hat should be, or what is desirable. Problems 
asking what has been are known as historical. We have no 
established designation for those under the second and third 
captions, but in the absence of a more appropriate term they 
may be referred to as scientific.^' Questions that ask what 
should be may be designated as ^^problems of purposes." A 
significant characteristic of such problems is suggested by the 
nature of the process of answering them With reference to 
pupil failures at a particular grade level, we may ask what per 
cent of pupils fail or w^hat will be the effect of a specified change 
in the plan of school organization, or in the method of instruc- 
tion, or in the curriculum. The answers to such questions may 
be derived from objective data and when obtained are regarded 
as facts. A statement of the per cent of pupils that should fail 
represents a judgment. 

The statement that the determination of what should be 
involves judgment is not intended to imply that the answer to 
such a question must be a mere opimon. In making a judgment, 
reflective thinking is involved. The problem is defined, data are 
collected, and consideration is given to the probable conse- 
quences of adopting different possible courses of action in an 
effort to determine which one is most likely to result in the at- 
tainment of an end judged to be desirable. For example, if the 
problem is to determine whether the assignment method or the 
project method should be employed in a particular school situa- 

411 
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tion/ a person should inquire into the various aspects of the 
particular situation — traits of the children, available equip- 
ment, traditions of the school, attitude of the community, train- 
ing and preferences of the teachers, competence of the super- 
vision, etc — and into the probable effectiveness of the two 
methods including not only the immediate achievements of the 
pupils but also their attitudes and preparation for future work. 
With these considerations in mind he will then decide what ap- 
pears desirable 

The range of problems of purposes. The field of curriculum 
construction furnishes a large group of problems of purposes, 
but questions that ask what should be, arise m other divisions of 
the field of education. The following are illustrative of such 
problems. 

1 What means should be used in securing publicity for a school 
building program? 

2 What IS the value of the daily assembly in high school? 

3 How should high school athletic funds be administered? 

4 Should applicants for admission to college be given an entrance 
examination? 

5. Should the college resort to careful selection of students for 
training as teachers? 

6 What would constitute a satisfactory minimum program for 
equalization of educational opportunity for which the state 
should assume responsibility? 

7. What should be the limits of state control of education? 

8 What should be the physical training and health program in the 
elementary school? 

9. What should be the relations between superintendents and 
business managers? 

10 Should foreign language credits be required for college entrance? 

11 To what extent should pupils participate m school administra- 
tion? 

12 What should be the immmum preparation of junior college 
instructors? 

13. What types of bonds should be issued in financing the construc- 
tion of school buildings? 

1 The reader should note that the question implied here is not the same as, 
*‘What is the relative effectiveness of the assignment method and the project 
method^ ” 
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Ftirtlier consideration of problems of purposes. Many prob- 
lems that ask what should be involve two types of questions: 

(1) a question of objectives and (2) a question of means. For 
example, consider the first question m the preceding list : What 
means should be used in securing publicity for a school building 
program*^^’^ In attempting to answer this question, it is neces- 
sary to consider first what objective is to be attained as a result 
of the publicity. Is approval of the proposed building program 
by the voters of the district desired<^ Or, is the objective to se- 
cure the decision that will be most beneficial to the educational 
interests of the community‘s After the objective has been de- 
cided upon, there remains the question of means— that is, how 
may the desired end be most efficiently attained*!^ In some prob- 
lems the subordinate questions are somewhat different. For 
example, the question concerning the value of the daily as- 
sembly m the high school may be analyzed as follows: (1) What 
is the effect of various types of daily assembly in the high school? 

(2) What is contributed by this effect to the objectives of the 
school? The second question imphes the problems of determin- 
ing the objectives of the school. 

The determination of objectives frequently raises many sub- 
ordinate questions. For example, in determining the objectives 
of a high school in a rural area, such questions as the following 
should receive consideration. Should the pupils be educated so 
that they will be better adapted to rural living or should they 
be educated so that they will be better adapted to living in an 
urban area, or so that they will be better adapted to living in an 
environment that may be either urban or ruraP If the purpose 
is adaptation to living in rural areas, what type of rural life 
should be postulated*?* What portion of the total preparation for 
rural living shall be made the responsibility of the high school? 
In a particular situation it will be necessary to consider the re- 
sources of the commumty, its probable future development, the 
interests and capacities of the children to be educated, the train- 
ing and experience of available teachers, and other practical 
aspects of the situation. 
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How problems of purposes are dealt with. The determmation 
of objectives or purposes requires procedures commonly desig- 
nated as philosophical Objective data may be collected and the 
results of scientific studies may be utilized, but the answer is 
essentially a judgment which represents what is considered to be 
most desirable. In contrast, the procedures employed in dealing 
with problems that ask what is or what will be, are designated as 
scientific. The reader, however, should not be mislead by this 
use of the terms philosophical and scientific. They are intro- 
duced as a means of convemence in discussing certain topics and 
not as designating procedures that have nothing in common. 
Questions of means suggest the experimental method, but in 
practice it is seldom feasible to deal with them by this proce- 
dure and hence the philosophical method is apphed to them 
also. 

The philosophical method. In its general outline the method 
of philosophy is fundamentally the same as that described as the 
method of research in Chapter 1. The philosopher defines his 
problem, collects data, formulates hypotheses and verifies 
them. In contrast to the scientific investigator, the philosopher 
deals with a wider range of data. He does not limit himself to 
those that may be collected by means of the techmques described 
in Chapter HI. He includes the results of his own observation 
and facts and principles from related fields. He may seek the 
experiences of other persons and their beliefs. In working with 
his data, his method is seldom statistical. In verifying his 
hypotheses he is more concerned with their implications and 
their relations to experience and to general principles ^ 

1 For further discussion of the methods of philosophy in contrast to the 
methods of science, see 

Lepley, Ray “Dependability in Philosophy of Education,” Teachers College, 
Columbia University Contributions to Education, No 461 New York Bureau 
of Publications, Teachers College, Columbia University, 1931, pp 1-15. 

Buckingham, B R “The Philosophy and Organization of Research,” School 
and Society, 29, 755-64, June 15, 1929 

Clugston, H A , and Davis, R. A “ Is a Scientific Method Possible for Philo- 
sophical Research*?” Educational Administration and Supervision, 12 293-99, 
April, 1930 

Clugston, H A,, and Davis, R. A “Suggested Criteria for the Philosophical 
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The method of philosophy cannot be described with the def- 
initeness that has been possible in the preceding chapters. 
Neither is it possible to subject a procedure of a subjective, or 
non-overt, nature to the same sort of scrutiny that is possible for 
such a technique as that of securing equivalent groups by pair- 
ing pupils on the basis of significant characteristics. We may, 
however, consider the principal phases of the method of philos- 
ophy as it is applied to educational problems that ask what 
should be. 

In defining his problem, the philosopher seeks to identify the 
fundamental questions involved and to formulate the assump- 
tions upon which the solution of the problem is to be based. 
In formulating these assumptions he may be guided by his own 
expenences, but frequently he consults various sources.^ The 
validity of the assumptions formulated depends on the avail- 
ability and dependabihty of existing human knowledge relevant 

Method of Research m Education,’* Educational Administration and Supernsion, 
16 575-80, November, 1930 

HuMsh, H G “The Relation of Philosophy* and Science in Education,’’ 
Journal of Educational Research, 20 159-65, October, 1929 

Kelley, T L “A Defense of Science in Education,” The Harvard Teachers 
Record, 1 123-30, November, 1931 

Kelley, T L “The Scientific versus the Philosophic Approach to the Novel 
Problem,” Scfience, 71 295-302, March 21, 1930 

Kilpatrick, W H “A Defense of Philosophy in Education,” The Harvard 
Teachers Record, 1 117-22, November, 1931 

Kilpatrick, W H “The Relation of Philosophy to Scientific Research,” 
Journal of Educational Research, 24 97-114, September, 1931 

1 Freeman, Reisner, and Peters have contended that psychology, history of 
education, and educational sociology are the sources of fundamental assump- 
tions The writers feel that any relevant body of tested knowledge should be 
consulted 

Freeman, F N “ Psychology as the Source of Fundamental Assumptions m 
Education,” Educational Administration and Supervision, 14, 371—77, September, 
1928 

Reisner, E. H “The History of Education as a Source of Fundamental 
Assumptions in Education,” Educational Administration and Supervision, 
14 378-84, September, 1928 

Peters, C C. “ Educational Sociology as a Source of Fundamental Assump- 
tions m Education,” Educational Administration and Supervision, 14:385-92, 
September, 1928. 

The following article should be read along with those referred to above 

Bode, B H “Where Does One Go for Fundamental Assumptions in Educa- 
tion?” Educational Administration and Supervision, 14 361—70, September, 
1928. 
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to the problem, and the extent to which this knowledge is intel- 
ligently brought to bear on the problem. It is also conditioned 
by the intelligence of the investigator, his beliefs and his prej- 
udices and by other factors that influence the quality of his 
thinking 

Interpretation of data in dealing with problems of purposes is 
accomplished by thinking of hypotheses and then checking them 
against all available criteria until the most satisfactory one is 
found. The formulation of hypotheses depends upon a personas 
sensitiveness to the meamng of his data, the breadth of his 
acquaintance with the field of his problem, his type of mind, and 
his fundamental philosophy of life It is to be expected that 
some hypotheses will be unsatisfactory The important con- 
siderations are that each hypothesis be carefully checked against 
the available criteria and that eventually an acceptable one be 
found. This tested hypothesis is the conclusion, the answer to 
the problem, but the critical worker will regard it as a judgment 
subject to modification in the light of any new data that may 
come to his attention. 

A person employing the philosophical method critically 
examines his procedure and his tentative conclusions He raises 
such questions as, “Have I been logical m my thinking about 
the matter*!^^' “Have I considered all that is relevant to the 
situation?^' “Have I succeeded in being open-minded in arriv- 
ing at a decision'^’^^ “Have I suspended judgment long enough 
to arrive at a decision which is a reasonably safe basis for 
action?’’ Another test of the defensibility of a decision with 
respect to what should be, is its fruitfulness in further thinking 
about the matter. Finally, the consequences of acting m ac- 
cordance with the decision provide the ultimate test of the 
defensibility of the decision and the basis for the formulation of 
modifications of the decision 

The contribution of objective techniques to the solution of 
problems of purposes. Although the determination of what 
should be involves judgment and the total procedure must be 
described as subjective, significant contributions may be obtained 
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from objective techniques. Facts and principles are needed for 
making intelligent judgments. For example, when a superin- 
tendent IS considering the desirability of increasing the number 
of teachers m order to reduce the size of classes, he will wish to 
know the effect of size of class upon pupil achievement. He 
probably will desire to know the practices in other school 
systems of similar size and resources. Objective techniques may 
be useful in securing the answer to a problem of purposes after 
an assumption or hypothesis is made with reference to the basis 
from which the answer is to be derived. For example, if the 
hypothesis is made that pupils should learn to spell the words 
that adults use in writing, objective techmques may be employed 
to determine the words that adults use and the frequency of 
their use. 

Philosophical procedures in dealing with problems of science. 
At various places in the preceding chapters the attention of the 
reader has been directed to assumptions that are basic to the 
techniques employed in dealing with questions which ask what 
is or what will be or that are introduced in interpreting the find- 
ings of such studies Identif 3 ning these assumptions and arriving 
at an estimate of their validity m particular cases calls for the 
methods of philosophy Hence, in practice the distinction be- 
tween scientific research and ^^philosophical research breaks 
down,^ A person dealing with problems of purposes frequently 
needs the assistance of objective techniques and the worker 
dealing with problems of science should at times engage in 
philosophizing. The need for philosophizing is especially urgent 
in experimental research When a question is asked concerning 
the relative effectiveness of two procedures or practices, it is 
generally assumed that an experimental solution of the problem 
is possible, but, as pointed out on page 291, this assumption may 

1 Symonds contends that it is inappropriate to recognize philosophical re- 
search as a type and his position is defensible if the meaning of research is 
restricted to that of a scientific investigation However, the use of the term 
maybe justified as a convenience Symonds, P M “A Course in the Tech- 
mque of Educational Research,” Teachers College Record^ 29 29-30, October^ 
1927. 
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not appear to be defensible when the problem is adequately 
defined. 

Dependability in science versus dependability in philosophy. 
There is a widespread tendency to consider objectivity of data 
the criterion of the dependability of a conclusion. If the data 
employed are highly objective, many persons classify the con- 
clusion as scientific and pronounce it dependable. On the other 
hand, if the data are obviously subjective, the conclusion is 
called unscientific and is considered lacking in dependability. 
This tendency is not justified. The faults of data were pointed 
out m Chapter V and their significance in the study of certain 
types of problems was emphasized in Chapters IX, X, and XI. 
Objectivity of data does not guarantee dependability of con- 
clusions. When the data are accurate, valid, and adequate with 
respect to the requirements of the problem, a conclusion derived 
from them is dependable When the data are not entirely satis- 
factory, a conclusion may be shown to be dependable in spite of 
their limitations. Frequently, however, conclusions from re- 
search employing objective data must be considered lacking m 
dependability. 

A conclusion reached by the methods of science is called de- 
pendable when it appears highly probable that a repetition of 
the investigation would lead to essentially the same conclusion. 
When applied to conclusions reached by means of the methods 
of philosophy, the meaning of dependability requires some 
adaptation. The criteria of dependability are implied in the 
questions : Was the problem adequately defined? Were the 
basic assumptions recognized and understood^’' “Were any 
pertinent data overlooked‘i’ " “Was the thinking biased or un- 
criticaF” “Were the probable consequences of the conclusion 
adequately considered‘s”' The philosopher is handicapped m 
testing his own thinking and even a competent critic may fail 
to detect weaknesses. It is difficult to foresee the consequences 
of a conclusion. In some cases they are revealed only as time 
passes. Hence, the testing of philosophical thinking frequently 
extends into the future. Sometimes the completion of the test 
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is deferred a generation or more. For example, the determina- 
tion of the soundness of the conclusion that ability grouping 
should prevail in our schools probably cannot be completed 
imtil the resulting effect upon society one or more generations 
hence becomes apparent. 

Since dependability of conclusions in philosophy does not 
mean the same as dependability of findings in science, a com- 
parison can have only a hmited significance. It seems reason- 
able, however, to say that the conclusions reached by a com- 
petent and critical person, relative to a problem of purposes, 
may be highly dependable and deserving of as much confidence 
as we give to reports of scientific findings in the field of educa- 
tion.^ 

Objective techniques and the problem of curriculum con- 
struction.2 Since curriculum construction furnishes a large 
number of the questions that ask what should be, the application 
of objective techmques in this field will be treated more explicitly 
Reference is frequently made to ‘^scientific curriculum con- 
struction’^ and examination of current curriculum studies re- 
veals many applications of objective techmques A curriculum 
consists of the objectives or goals pupils are expected to attain 

^ For an extended consideration of tlie question of the dependability of con- 
clusions in philosophical inquiry, see Lepley, Hay “ Dependability in Philosophy 
of Education,” Teachers College, Colurnhm Unvoersity Contributions to Education, 
No 461 New York Bureau of Publications, Teachers College, Columbia Uni- 
versity, 1931 96 pp The final chapter in which the author states his conclusion 
IS worthy of careful study by anyone interested in the question 

2 Douglass has reported an illuminating analysis of curriculum research m 
secondary education during 1929 By means of correspondence with instructors 
in colleges and universities and the examination of publications for the year he 
located 74 curriculum researches at the high school level, 40 of which were 
masters’ theses Approximately half of the studies dealt explicitly with the 
question of what the curriculum should be With reference to the quality of the 
research, he states that "To a large extent the contributions of research to the 
secondary school curriculum have as yet been trivial and have been based on six 
conceptions which themselves are not established on scientific foundations and 
which to many educators seem definitely unsupportable ” Douglass, H R 
"Types and Fields of Curriculum Research in Secondary Education during 
1929,” School Review, 38 656—62, November, 1930 

The interested reader will find a good general discussion in the following 
reference 

Judd, C H "The Place of Research in a Program of Curriculum Develop- 
ment,” Journal of Educational Research, 17 313-23, May, 1928. 
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and of the learning exercises and materials of instruction to be 
employed as a means for securing the learning activities whose 
outcomes will be the abilities specified as immediate or control 
objectives Usually, however, curriculum construction has been 
interpreted to mean only the deteimination of objectives Since 
learmng exercises and materials of instruction are essentially 
means, the determination of objectives is the basic problem 
When attempting a determination of objectives, a distinction 
should be made between remote or ultimate objectives and 
immediate objectives Ultimate objectives are to be conceived 
of in terms of the characteristics (conduct) of the social group 
we desire to build up and perpetuate. For example, if a cur- 
riculum IS being constructed for training plumbers, the ulti- 
mate objectives would be expressed m terms of the characteris- 
tics of the ideal plumber. Judgment cannot be avoided in 
determmmg this ideal plumber. The determination may be 
given the appearance of science by employing techniques of 
activity analysis, but judgment is introduced in selecting the 
plumbers whose activities are analyzed 
The techniques of activity analysis enable research workers 
to collect information useful m the formulation of ultimate 
objectives. Plowever, judgment may be required in adding 
objectives which the techniques of activity analysis fail to dis- 
cover. For example, objectives which have to do with the con- 
templation of the beautiful and the good and those which may 
be postulated as more important in the future than in the lives 
of adults of the present generation will not be revealed by 
activity analysis. Another limitation of activity analysis, es- 
pecially when applied to adults, is that it neglects the activities 
of pupils which are necessary for an effective organization of 
the curriculum For example, adults may be found to use a 
certain limited number of arithmetical operations, but children 
may need to use many more arithmetical operations during the 
course of their formal education m order to acquire and retain 
those that they will use as adults The contention may be 
supported that activity analysis is a useful technique for col- 
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lectmg needed factual data, but it is a technique which must 
be used with full recognition of its himtations and with the 
recogiution that it cannot be expected to accomplish the deter- 
mmation of ultimate objectives without supplementation,^ 

The techniques of activity analysis. Six techniques of activity 
analysis have been employed. 

1 ^Tntrospection/' in which a participant in the activity lists all 
of the subsidiary activities or duties of which he can think. 

2 ^^Working on the job/^ which is a modified form of introspection. 

3, ^Tnterviewmg/’ m which a trained interviewer asks a participant 

in the major activity to give a hst of his duties 

4 '‘Questionnaire/^ which is essentially a type of interviewing. 

5 Observing workers and noting the particular duties they perform. 

6 Analyzing records of activities performed. 

Introspection is an effective techmque in activity-analysis 
when the analyst has had considerable experience with the 
acti'vnty Working on the job is useful where the analyst needs 
to acquire expeiience with the activity The questionnaire 
and interview techmques are useful m securing data with respect 
to actimties from a representative group of individuals. With 
the exception of representativeness, the data secured by all 
four of these techniques are similar m character All involve 
introspection, and hence are subjective The individual re- 
porting the activities in which he is engaging, or has engaged, 
is likely to report those activities which are more or less routine 
in character to the neglect of activities that are performed 
only occasionally. These occasional activities may be the ones 
which require the greatest ability. The individual reporting his 
activities may tend to record those which seem to him impor- 
tant and neglect to report activities of whose importance he is 
unaware For example, an auto mechanic might list compre- 
hensively the various types of repair jobs in which he has 
engaged and neglect to mention his activities in dealing with 
customers which should be included if the analysis of his voca- 
tion is to be complete Furthermore, he may report his activ- 

^For an elaboration of this point see Bode, B H Modern Educational 
Theories New York The Macmillan Company, 1927, Chapter V 
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ities without indicating how they are performed For example, 
the activity reported as '^replacing broken timing gear'^ may 
represent insufl&cient analysis and inadequate information with 
respect to other activities engaged in concomitantly. A more 
complete analysis might include, responding promptly to motor- 
ist’s telephone call for aid, towing car to town with precautions 
to prevent accident; diagnosing correctly what is wrong with 
the car; removing hood and radiator with care to prevent 
marring, jacking up engine, removing fan belt, fan, and gear 
casing; removing broken gear with suitable tools; selecting the 
appropriate new part; adjusting the new gear so that the timing 
will be correct; and so on, in detail, until the list is concluded 
with mention of a courteous word of thanks to the motorist 
and the request to call again 

The technique of observing workers has the advantage that 
the analyst may give more concentrated attention to the analysis 
of the activity The worker may not be capable of effective 
introspection with respect to his job while performing it effi- 
ciently The observer is in a somewhat better position to note 
the relations between the activities performed and the reactions 
of people served by the activities. The observer may be aided 
in his analysis by the utilization of such apparatus as a stop- 
watch and motion picture camera. The motion picture camera 
is particularly effective where a permanent record is desired 
which may be subjected to a detailed analysis of the mo- 
tions used in performing the activity The data obtained as a 
result of observation may be relatively objective in charac- 
ter. 

The analysis of records of activities is a useful technique for 
obtaining information with respect to arithmetical operations 
performed by workers in various fields, words employed in 
written correspondence, and the like It is also an advantageous 
technique in the analysis of activities other than vocational or 
professional. For example, analysis of the records of book 
withdrawals in public libraries yields information relative to 
the types of reading done by adults and by children News- 
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papers and periodicals may be analyzed for the occurrence of 
certain types of items. Records of problem solutions in chem- 
istry may be analyzed for the arithmetical operations needed 
in their solution 

The method of consensus of opinion.^ The method of activity 
analysis is most useful when objectives are being determined 
for a well-defined occupation or activity. Its usefulness is 
hmited when the problem is to determine the objectives of 
such school subjects as reading, geography, history, algebra, or 
physics. Expressions of opinions may be secured by means of 
a formal questionnaire, but some of the limitations of this pro- 
cedure can be overcome by seeking out opinions that have al- 
ready been expressed by competent persons. The study by 
Hockett described briefly in Chapter I is a good illustration of 
the latter method. It seems reasonable to assume that the 
problems and issues identified by Hockett do represent with a 
high degree of accuracy the problems and issues confronting the 
present generation of adults, and that many of these problems 
and issues will be applicable to the next generation. 

The method of consensus of opinion as employed by Hockett 
is useful for determining ultimate objectives in several subject- 
matter fields. In many cases it enables the curriculum builder 
to set up objectives more likely to be valid in the preparation 
of children for the activities of adult life, than objectives deter- 
mined merely through analysis of the activities of the present 
generation It should be noted, however, that a consensus of 
opinion is likely to be weighted by tradition. Following the 
report of the Committee of Ten, curriculum construction by 
committees has been a common procedure. The high degree of 
agreement frequently reflected by the report is an indication 
of the influence of tradition upon their thinking ^ 

1 For a discussion of an assumption basic to a consensus of opinion, see page 255 

2 For an elaboration of this point see 

Kelley, T. L Scientific Method New York The Macmillan Company, 1932, 
pp 152-64 

For a general criticism of consensus of opimon m curriculum construction see 

Bode, B. H. Modern Educational Theories New York The Macmillan 
Company, 1927 Chapter IV. 
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Techniques useful in the later stages of curriculum construc- 
tion. After the ultimate objectives have been determined, 
other problems require attention’ the determination of immedi- 
ate objectives, the abilities necessary for performing the activ- 
ities recognized as ultimate objectives, formulation of learning 
exercises that will be instrumental in stimulating and directing 
children in their acquisition of the abilities specified as im- 
mediate objectives, ^ the selection of suitable materials of in- 
struction; and the determination of the placement of learning 
exercises and materials of instruction m an effective sequence in 
the orgamzed curriculum These problems call for the deter- 
mination of relative merits of comparable means and hence 
may be attacked experimentally The total program of required 
experimentation would be an extensive one and in the case of 
the determination of immediate objectives, several years would 
be required to complete an experiment Our schools are in 
operation and when the curriculum revision is decided upon it 
seldom seems feasible to plan for a group of experimental studies 
extending over a period of years. Consequently, the curricu- 
lum for the next year is constructed by other means Experi- 
mental studies, however, are steadily contributing information 
which makes the construction of the curriculum less subjec- 
tive. 

Experimentation in the psychological laboratory and under 
school conditions has contributed information relative to the 
nature of pupil abilities in several subject-matter fields.^ For 
example, a number of studies have dealt with the specificity of 
the calculation skills of arithmetic, and, although the findings 
are not entirely consistent, they may be labeled an important 
contribution in the field of curriculum construction. The lab- 
oratory experimentation on the nature of silent reading ability 
conducted at the University of Chicago has had a very significant 
influence on the reading curriculum, particularly on the learning 

^This problem includes the optimum adjustment of learning exercises to 
individual differences 

^Research involving the application of correlation analysis described in 
Chapter XI is also contributing to this end. 
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exercises and materials of instruction There have also been 
numerous experimental studies of the relative merits of types 
of learmng exercises and their orgamzation Among those 
studied are learnmg exercises relating to phonics, requests to 
learn rules in spelhng, exercises in formal grammar, practice 
exercises in arithmetic, written assignments, the Dalton Plan 
and other contract organizations of learning exercises, projects, 
and lecture-demonstrations. A small number of experimental 
studies have dealt with the placement of learmng exercises and 
materials of instruction. 

Although the experimentation relating to the curriculum has 
been fragmentary and the number of dependable findings is not 
large, the possibility of such research is apparent and it seems 
reasonable to expect that eventually the curriculum-maker will 
have available sufiicient experimental information to make the 
later stages of curriculum construction highly objective ^ It 
should be noted, however, that the determination of the ulti- 
mate (conduct) objectives in which philosophical methods 
must be employed is required as a basis for such experimental 
studies. 

Until experimental findings are much more adequate, the 
determination of immediate (control) objectives, the selection 
of learning exercises, and their organization and placement must 
be accomplished mainly by other means Surveys of present 
practices and of opinion, including the analysis of textbooks and 
courses of study, will be helpful, but the hmitations of the find- 
ings from such studies should be recognized. A consensus of 
opinion, as well as the average of present practices, is weighted 
by tradition. The findings of studies of pupil interests and of 
pupil achievement are influenced by present practices. How- 
ever, the curriculum-maker who uses survey findings intelli- 
gently will probably formulate a better curriculum than he 
can make by disregarding such information. 


^ The range of scientific studies contributing to curriculum construction is 
wide For an indication of their scope, see Thorndike, E L “Curriculum 
Ilesearch,” School and Society^ 28 569—76, November 10, 1928 
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Monroe, W S , and Herriott, M E “Reconstruction of the Secondary 
School Curriculum Its Meaning and Trends,” Unwersity of Illinois 
Bulletin, Vol 25, No 42 Bureau of Educational Research Bulletin, 
No 41 Urbana Umversity of Illinois, 1928 120 pp 

This IS a comprehensive study of the trends m the leconstruction of the 
secondary-school curriculum during the period 1893~*1928. 

Monroe, W S , Hindman, D A , and Lundin, R S “Two Illustrations of 
Curriculum Construction,” University of Illinois Bulletin, Vol 25, 
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No. 26, Bureau of Educational Research Bulletin, No 39 Urbana 
University of Illinois, 1928 53 pp 

The illustrations of curriculum construction described m this monogiaph 
involve little or no use of objective data The procedure employed may be 
described as “systematic and critical judgment In both cases, the first 
major step was to formulate an analytical desciiption of the ultimate or 
conduct objectives for which the proposed curriculum was considered to 
eontribute equipment From these conduct objectives, the immediate or 
control objectives were derived The last two steps include the piedicting 
of the learmng activities necessary for acqmnng the specified controls of 
conduct, and the learning exercises that will serve as eflGlcient bases for 
these activities 

Mtjthersbaugh, G C ^'Objectives of a Proposed Course of Study m 
Physics for Senior High Schools,” School Science and Mathematics, 
29 943-54, December, 1929 

Four texts in high school physics and four courses of study were selected 
for analysis on the basis of the j’udgments of 16 leaders in the field of physics 
teaching These sources were examined for objectives, an objective being 
defined as “a specific goal expressed in terms of useful life situations ” The 
analysis lesulted in 1018 shps each containing a stated objective Classifica- 
tion and elimination of duplicates reduced the numbei to 275 A number 
weie then eliminated on the basis of inapplicability to useful life situations 
after a rating on a ten-point basis with respect to usefulness, frequency of 
occurrence, and interest The 221 objectives finally presented m the report 
are classified under 43 “units ” The author has added a list of 20 supple- 
mentary obj'ectives The study exemplifies the inadequacy of “objective” 
data in curriculum research Judgment was introduced m the elimination 
and supplementation 

Peik, W E “The Analysis and Evaluation of College and University 
Courses in Education,” Journal of Educational Research, 18 345-55, 
December, 1928 

Syllabi or complete sets of lesson units submitted by thirteen instructors 
of fifteen educational courses including eight in special methods were 
analyzed into 814 topics of instruction These topics were evaluated by a 
group of alumni with respect to helpfulness m classroom teaching and 
educational thinking The alumni were also asked to indicate which topics 
they remembered having been taught in the prescribed courses, which 
should be omitted from such courses, and which were treated inadequately 

Peters, C C Objectives and Procedures in Civic Education New York. 
Longmans, Green and Company, 1930 302 pp. 
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Chapter IV contains a “Blue Print of an Optimum Citizen” made up of 
short statements of the objectives of education for citizenship as derived 
from over a thousand separate studies made by the author and his students 
Included are suggestions of possible means and occasions for traimng to 
meet these objectives The student should read m this connection EjI- 
patrick's criticism and Peter's defense of his philosophy and his method 
Kilpatrick, W H. “Hidden Philosophies,” of Educational Sociology, 

4 59-68, September, 1930 Peters, C C “Revealed Philosophies — Reply 
to Professor Kilpatrick,” Journal of Educational Sociology, 4.260-71, 
January, 1931 Peters holds that the conflict between the opposing schools 
of curriculum theorists and builders is a conflict of philosophies The 
pragmatists, Dewey, Kilpatrick, and Bode stress the world m the making 
while Peters, whose philosophy is that of absolute idealism, does not feel 
that the world is too chaotic to prevent systematic plannmg of curricula 

Reagan, G W. “The Mathematics Involved in Solving High School 
Physics Problems,” School Science and Mathematics, 25: 292-99, 
March, 1925 

A total of 241 problems in Millikan and Gale's A First Course in Physics 
were solved and analyzed The results are reported under the headmgs of 
arithmetic, algebra, and geometry 

Rtjgg, H 0 , et al “The Foundations and Technique of Curriculum- 
Making,” The Twenty-Sixth Yearbook of the National Society for the 
Study of Education Bloomington, lUmois Public School Publishing 
Company, 1926 Part I, 475 pp Part II, 210 pp. 

Part I, “Curriculum-Making Past and Present deals with the historical 
development of the curiiculum and present practices m curriculum con- 
struction. Sections are devoted to examples of curriculum construction in 
progressive public school systems and m private laboratory schools. The 
last section of this volume contains a review and critique of curnculum- 
making for the vocations, curriculum reconstruction on the college level, 
curriculum-making by state legislatures, an appraisal of current methods 
of curriculum-making, and an extensive annotated bibliography 

Part II, “The Foundations of Curriculum-Making” presents a compila- 
tion of principles with respect to the curriculum signed by the members of 
the committee This compilation is followed by supplementary statements 
of the committee members indicating their mdmdual points of view with 
respect to the principles previously expressed The volume concludes with 
quotations on the curriculum from the writings of John Dewey and a 
number of quotations from the Herbartians and their critics, 

Stratemeyek, F B , and Bbxjner, H B “Rating Elementary School 
Courses of Study,” Studies of the Bureau of Curriculum Research of 
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Teachers College, Columbia University, Bulletin No 1 New York* 
Bureau of Publications, Teachers College, Columbia University, 1926. 
193 pp 

In this monograph are described the techniques used m rating eight 
hundred to a thousand courses of study in each of the subject-matter fields 
of the elementary school One hxmdred twenty-one judges participated 
using criteria, formulated as rating scales, and derived from examination 
of a large number of courses of study These criteria constitute one of the 
chief contributions of the study A list of courses of study are given which 
most nearly conform to the ‘^best points of the criteria ” 

Steong, E K ‘‘Job Analysis of the Manager of Industry,” School and 
Society, 13.456-62, April 16, 1921 

The author discusses research conducted at Carnegie Institute of Tech- 
nology m which job analyses were made in three fields of executive work — 
bmldmg construction, commercial printing, and metal working industries. 

Tyrrell, Doris “An Activity Analysis of Secretarial Duties as a Basis 
for an Office Practice Course,” Journal of Experimental Education, 
1. 323-40, June, 1933 

In this study an evaluation is reported of 406 secretarial duties given in 
the list compiled in previous research by Chai ters and Whitley In addition 
to the frequency ratings of Charters and Whitley, here reported in terms of 
decile ranks, the author reports evaluations with respect to the importance, 
difiSculty, and desirability for pre-service training of each of the duties 
This study represents a stage in the process of curriculum construction 
midway between the work of activity analysis and the formulation of a 
course of study 

Wilson, G M “A Survey of the Social and Business Use of Arithmetic,” 
Sixteenth Yearbook of the National Society for the Study of Education. 
Part I Bloomington, Illinois Public School Publishing Company, 
1917, pp 128-29 See also 

Wilson, G M. What Arithmetic Shall We Teachf Boston: Houghton 
Mifflin Company, 1926, pp 7-9, 30-51, 58-63 
The purpose of this research was to determine the arithmetic “actually 
needed by social and business usage.” Sixth-, seventh-, and eighth-grade 
pupils were asked to collect “every problem solved by either the father or 
the mother . . through a penod of two weeks ” This is an important 

pioneer study. 

Wray, Robert P “The Relative Importance of Items of Chemical In- 
formation for General Education,” Journal of Experimental Educctr 
turn, 1 341-89, June, 1933. 
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A list of 1550 items of chemical information, secured as the result of an 
analysis of four commonly used high school chemistry texts, was evaluated 
in this study In obtaining the evaluations, fifteen questionnaires of the 
check-list type were sent to various groups of individuals For example, 
the first questionnaire was returned by 176 persons including teachers, 
engineers, students, laborers, housekeepers, medical men, secretaries, and 
business men. 



CHAPTER XIII 


EVALUATING AND SYNTHESIZING EDUCATIONAL 
RESEARCH 

The need for evaluating reported conclusions. The preceding 
chapters have dealt primanly with the production of educational 
research. Consumers of research, however, are much more 
numerous than those engaged in production. Hence, it is ap- 
propriate that we consider also the problems arising in the use 
of reported research An important question relates to the 
acceptance of the stated conclusions at their face value. Is the 
reader justified m accepting conclusions as reported, or should 
he seek to determine for himself their dependability'^^ 

When considering this question one should bear in mind that 
educational research is a recent development. The first text on 
statistical methods applied to educational problems, Thorndike^s 
Theory of Mental and Social Measurement, was not published 
until 1904, and the development of instruments for measuring 
mental traits and abilities had scarcely begun by 1910 Although 
much has been accomplished in the development of research 
techniques, especially since 1918, there are many unsolved 
problems in the field of educational measurements, and none of 
the many texts on statistical methods provides a satisfactory 
treatment of statistical techniques In view of these conditions 
it IS to be expected that some of the reported conclusions are not 
dependable. The popularization of educational research and the 
attainment of quantity production have operated to increase 
greatly the amount of amateurish activity. The emphasis upon 
the merits of objective data has tended to cause investigators to 
neglect the limitations of the data being used and the employ- 
ment of statistical techniques that were only vaguely under- 

436 
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stood ^ has encouraged a mechanized mterpretation, especially 
in the case of differences and of coefficients of correlation. 

A number of critical writers have commented upon the qual- 
ity of educational research,^ pointmg out that reported conclu- 
sions are not infrequently lacking m dependabihty. Sometimes 
the research has been done carelessly without a well defined 
problem. In other cases the data are inaccurate or incomplete, 
and hence, even though the author has endeavored to apply 
appropriate statistical techniques his findings are subject to 
qualifications and limitations which have not been adequately 
noted in his report Occasionally the statistical treatment is not 
appropriate and in a surprising number of cases there has been 
failure to interpret statistics correctly and with precision. Hence, 
a reader should be critical and not accept, without evaluation, 
reported conclusions 

The need for summarizing reported research. The quantity 
production that has prevailed in educational research since soon 
after 1920 has resulted in a large number of investigations re- 
latmg to many topics and problems Today there is scarcely 
a topic for which it is not possible to compile an extensive bib- 
liography of references purporting to be reports of research. In 
many cases a bibliography relating to a fairly narrow topic will 
include more than a hundred items Hence a person interested 
in the results of research relating to a particular topic or problem 
frequently faces a task that requires many hours of labor. When 
a comprehensive bibliography is examined it is not infrequently 
found that several references are to unpublished studies. Others 
are to bulletins or periodicals not easily accessible, especially to 
the practical schoolman Even when the references listed are 
accessible the task of reading a large number of reports requires 
time. The practical schoolman seldom has the time or, if he 
does, the task does not appear to be the most worthy one com- 

1 Lehman, H C , and Witty, P A “Statistics Show — Journal of Educa- 
tional Psychology, 19 175-84, March, 1928 

Although reference is not made to particular studies, this article is a severe 
indictment of the statistical aspects of educational research 

2 For illustrations of critical statements, see Chapter XIV, page 467 
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manding his attention. Hence, he selects a few of the most 
attractive and most accessible references and accepts the findings 
of these studies as representing the results of research relating to 
the problem m which he is interested. Graduate students and 
other persons, except the more systematic and conscientious re- 
search workers, frequently follow about the same plan. The 
need for summaries of the results of research is therefore ap- 
parent 

In addition to facihtating the use of educational research, 
summaries are helpful to the research worker. Critical sum- 
maries make evident the phases of problems that have been in- 
adequately studied thereby assisting an investigator in defining 
his own problem. If the summary includes comments relative 
to the advantages and disadvantages of the techniques em- 
ployed, he will be helped in deciding upon the details of his 
procedure In case a critical summary is not available the first 
step in planning a research should be a review of the studies 
relating to the proposed problem. Not infrequently the number 
of such studies is so great that the task of evaluating them and 
synthesizing the findings consumes so much time that the in- 
vestigator’s enthusiasm is greatly diminished. When his time is 
limited, as in the case of graduate students, he finds that when 
the summary is completed he has little or no time for his own 
inquiry. 

Published summaries. The need for summarizing the results 
of educational research is bemg recognized. In 1925, W. S Gray 
published a summary of investigations in the field of reading 
which has been supplemented annually. In 1925, Buswell and 
Judd published a summary of investigations in arithmetic which 
also has been supplemented annually by Buswell. There are a 
number of other extensive summaries, notably that of Curtis, 
A Digest of Investigations in the Teaching of Science^ which ap- 
peared in 1926 and was supplemented in 1931; and Lyman, 
Summary of Investigations Relating to Grammar , Language^ and 
Composition which appeared in 1929 Many reports of research 
include a summary of previous summaries relating to the prob- 
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lem. In February, 1931, the American Educational Research 
Association began the pubhcation of the Review of Educational 
Research which will present once every three years a summary 
of the research relating to the following general topics.^ 

1 Methods and Techniques of Educational Research. 

2 The Curriculum 

3 Teacher Personnel. 

4. School Organization. 

5 Psychology and Methods in the High School and College. 

6. Special Methods and Psychology of the Elementary School 
Subjects. 

7 Finance and Business Administration. 

8 Psychological Tests. 

9 Buildings, Grounds, Equipment, and Supphes. 

10 Educational Tests. 

11. Mental and Physical Development and Individual Differences. 

12, Student Accounting, Personnel, and Guidance 

13 General Methods and Supervision. 

14 History of Education and Comparative Education 

15 Legal Basis of Education 

The quality of published summaries. Unfortunately few of 
the summaries being published are critical. In most cases the 
reported conclusions of the studies listed are presented without 
evaluation. For example, m a recent summary of which the 
bibliography totaled 885 references, evaluative statements were 
made m the case of only 52 of the studies, and only 5 of the 
evaluative statements indicated that the reported conclusion 
was lacking in dependabihty or was subject to qualification In 
view of the difficulties that are encountered in educational re- 
search and the inadequacies of our techniques, it is inconceivable 
that m such a comprehensive list of references there should not 
be a much larger number of researches whose reported findings 
deserve criticism This summary is, perhaps, not typical, but an 
extended examination of published summaries indicates that 
only relatively few may be regarded as critical.^ This is un~ 

^ The list of topics for the first three-year cycle ending December, 1933, was 
slightly different See page 462 for the list 

2 For a systematic evaluation of two summaries, see Wilson, G M “Re- 
search Suggested Standards for Summarizing and Reporting Applied to Two 
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fortunate because summarization without critical evaluation 
contributes to the perpetuation of error and triviality 

Evaluation and summary as a type of research. Although the 
summarization of pertinent studies should be a phase of any 
piece of research, it seems appropriate to propose evaluation 
and summary as a type of research, especially when the field 
considered is at all comprehensive Such endeavors require the 
utilization of critical reflective thinking and a scientific attitude, 
the collection of the best data obtainable in a systematic manner, 
and the use of these data with full recognition of their limita- 
tions. The total procedure of preparing a critical summary con- 
forms with the defimtion of educational research given in Chap- 
ter I Although critical summaries, however comprehensive, can 
seldom be termed original contributions m the sense of providing 
new knowledge, when well done they contiibutc to the organiza- 
tion of the science of education. The evaluation, interpretation, 
and synthesis of the findings of original studies is an essential 
phase of the development of a science 

Phases of the preparation of a critical summary. The first step 
m the preparation of a summary is the compilation of a bib- 
liography of researches relating to the problem or topic The 
second step is to examine critically each of the researches to 
determine the dependability of the reported conclusions If 
a conclusion as stated is not judged to be dependable the re- 
viewer may be able to formulate one that appears to be justified 
by the data collected, or at least to supplement the stated con- 
clusion with suitable limitations and qualifications. The final 
step IS to organize and synthesize the dependable conclusions 
into a summary account of what research has revealed relative 
to the phases of the problem or topic. 

Compiling a bibliography. ^ Pubhshed bibliographies are 

available for a large number of fields and topics and frequently a 

Recent Summaries of Studies in Arithmetic,” Journal of Educational Research, 
28 187-94, November, 1934. 

^ For a comprehensive treatment see Alexander, Carter How to Locate 
Educational Information and Data New York. Bureau of Publications, Teach- 
ers College, Columbia University, 1935 272 pp. 
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worker will find several lists of references from which he may 
compile a bibliography for the topic or problem m which he is 
interested The bibliographies of the Review of Educational 
Research mclude practically all of the better studies for the 
period covered and the selected hsts of references in the Eh- 
mentary School Journal and the School Review are helpful ^ 
Other lists of references may be located by consulting a bib- 
liography of bibliographies ^ The Education Index initiated 
January, 1929, indexes by topic and author educational peri- 
odicals in English, a few of the more important educational 
periodicals in foreign languages, and current books, pamphlets, 
and documents. The Psychological Index published since 1894 
is helpful in compiling a bibliography m the field of psychology.® 
The publications of United States Office of Education mclude 
many bibhographies ^ Titles of doctors^ and masters’ theses in 
education for the period January, 1917, to October, 1927, are 
listed in compilations prepared by the Bureau of Educational 
Research at the University of Illinois.^ Since 1927, graduate 

^ The publication of these lists of selected references was begun in 1933 At 
the end of the year, the twenty bibliographies are assembled and published in a 
monograph with a title of “Selected References in Education ” 

2 Monroe, W S, and Asher, Ollie “A Bibhography of Bibliographies,’* 
Umversity of Ilhnois Bulletin, Vol 24, No 44, Bureau of Educational Research 
Bulletin, No 36 Urbana University of Illinois, 1927 60 pp. 

Monroe, W S , Hamilton, T T , and Smith, V T “Locating Educational 
Information m Published Sources,” University of Illinois Bulletin, Vol. 27, No 
45, Bureau of Educational Research Bulletin, No 50 Urbana, Illinois* University 
of Illinois, 1930, pp 58-142 This bulletin includes the more important bibliog- 
raphies listed m the earher publication 

Monroe, W S , and Shores, Louis Bibliographies and Summaries in Educa-^ 
tion. New York The H. W. Wilson Company, 1936 465 pp Over 4000 

references are indexed 

3 For information m regard to other bibliographical aids, see Witmer, E M., 
and Miller, M C “Guides to Educational Literature m Periodicals,” Teachers 
College Record, 33 719-30, May, 1932 

^For information m regard to these publications, see Witmer, E M, and 
Miller, M C “US Office of Education Serial Publications,” Teac/iers CoHepe 
Record, 34 302-11, January, 1933 

s The reference to the most recent compilation is Monroe, W S (Compiler). 
Tdlea of Masters'" and Doctors' Theses in Education Accepted by Colleges and 
Universities in the United States between October 15, 1925 and October 15, 
1927 Urbana, Ilhnois College of Education, University of Illinois, 1928. 
252 pp 
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theses have been included m the Bibliography of Remarch Studies 
in Education published annually by the United States Office of 
Education. A number of institutions are publishing annotated 
lists of the theses accepted in partial fulfillment for graduate 
degrees m education. Other institutions publish merely lists of 
titles.^ 

The graduate student in education should become famihar 
with the important journals, yearbooks, and monograph series ^ 
so that he will be able to locate likely sources of relevant refer- 
ences. Under the direction of Carter Alexander, Library Profes- 
sor, Teachers College, Columbia University, a series of guides 
to the professional literature related to various phases of educa- 
tion have been prepared. These are being published in various 
periodicals so as to bring each article to the attention of those 
readers who are most likely to be interested in it.® In compiling a 
bibliography it is frequently desirable to ^Hhumb through’^ the 
pages of the journals that are known to contain reports of re- 
search in the field of the problem. If it does not seem worth 
while to look through the pages of the volumes, it is almost as 
effective to scan the indexes. Thumbing the pages is more 
effective since it overcomes the handicap of general or misleading 
titles. 

The graduate student, or other research worker in education, 

^See Dernng, C. E. “Lists and Abstracts of Masters’ Theses and Doctors’ 
Dissertations m Education,” Teachers College Record, 34 490-502, March, 1933 

The worker who is interested in learning about research under way or com- 
pleted but not published will find information in regard to sources m the follow- 
ing reference 

Witmer, E. M “Educational Research A Bibliography on Sources Useful 
in Determining Research Completed or under Way,” Teachers College Record, 
33 335-40, January, 1932 

2 For a comprehensive description of educational serials, see Monroe, Hamilton, 
and Smith, op cit , pp 19-57 

®The one for the field of secondary education was published m the School 
Remew 

Manske, A, J , and Alexander, Carter “Guide to the Literature on Secondary 
Education,” School Renew, 42 368-81, May, 1934 

In addition to a selected list of bibliographies, information is given relative to 
sources of mformation for such topics as the following periodicals, associations, 
book reviews, editorial comment, news notes, researches completed, under way, 
or needed, and statistics. 
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should be systematic m compiling a bibliography. He should 
use each of the aids suggested completely before going on to the 
next. That is to say, he should extract all the help the aid has to 
offer before looking elsewhere for references. It is desirable to 
keep a memorandum of the sources from which a bibliography 
has been compiled. The following is an example of such a 
record: 

1 Education Index, January, 1929, to September, 1932 

2. Journal of Educational Research, January, 1920, to September, 
1932 

3. Teachers College, Columbia University Contributions to Educa- 
tion, No. 300 to No. 545. 

When such a record of sources is kept, it is a simple matter to 
bring the bibliography up to date Furthermore, a record of the 
sources tends to stimulate the student to more systematic and 
scholarly effort. 

In copying a reference, complete information should be 
recorded The use of 3 X 5 or 4 X 6 cards is recommended. 
Larger cards, or even manuscript paper, are desirable when 
annotations are to be made. 

Criteria for identif5ring educational research. It may appear 
unnecessary to raise the question of identifying educational 
research, but certam writers have pointed out that much of 
what is commonly called educational research does not deserve 
this label. For example, Rugg has said that “most of our so- 
called ^educational research’ is not educational research at 
all ” ^ The same position is taken in an editorial in the School 
Review for September, 1926, commenting on the “Bibliography 
of Secondary Education Research, 1920-25 ” ^ 

Such criticisms imply criteria by means of which reports of 

iRugg, H O “Statistical Methods Applied to Educational Testing,’* 
Twenty-^Pvrst Yearbook of the National Society for the Study of Education Bloom- 
ington, Illinois Public School Publishing Company, 1922, pp 45^91 

2 Wmdes, E E , and Greenleaf, W J “Bibliography of Secondary Educa- 
tion Research, 1920-25,” U. S. Bureau of Education Bulletin, 1926, No 2, 
Washington, 1926 95 pp. 
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•educational research may be distinguished from writings that 
do not represent research, but no authoritative hst has been 
formulated. Thoughtful consideration of the matter reveals the 
difl&culty. No sharp line of demarcation can be defined. Al- 
though the extremes of both groups of writings stand out clearly, 
one group tends to merge into the other. The difficulty of iden- 
tifying research is accentuated by reason of the fact that some 
investigators quit before their data have been adequately 
interpreted and that others are careless or inexpert in the use 
of the techniques of research. 

In identifying writmgs that should be labeled research,^' 
it seems desirable to be rather highly selective. From the point 
of view of developing a science of education, studies that repre- 
sent a mere collection of information or whose findings have 
little significance outside of the situation studied must be re- 
garded as trivial. Such studies may be useful in the administra- 
tion of a school or in other practical activities, but their con- 
tribution to a science of education is slight Frequently there is 
no direct contribution Hence, it seems desirable to label such 
investigations, service studies’^ and to restrict the use of 
research’’ to studies that have a distinct contributory value. 
It is from this point of view that the following criteria are sug- 
gested as a means of identifying research. 

1. There should be a problem which functions as a guide in col- 
lecting data and in the subsequent phases of the work. Usually 
this problem is clearly defined by the investigator as a preliminary 
phase of his work 

2. An essential requirement is that the data collected afford some 
basis for generalization and that the interpretation of the data 
be continued until a tentative generalization is reached. As 
used here a generalization designates a statement of conditions, 
trends, or relationships which may be utilized as a basis of 
thinking or action in situations other than the particular one 
studied. 

3 Another essential requirement is that in interpreting the data 
adequate recogmtion be given to their faults and to the lim- 
itations of the statistical procedures employed m handlmg 
them. 
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The first criterion affords a basis for efimmating discussions, 
expressions of opinions, propagandistic writings, and the hke 
There must be systematic collecting of data pertinent to a 
problem. Apphcation of the second criterion will result in the 
rejection of studies in which the data collected are lackmg in 
representativeness or for other reasons do not afford a basis for 
generalizing. Among the studies thus rejected will be those 
whose significance is only local, and those based upon too few 
cases, ^ and those in which no consideration has been given to the 
representativeness of the data. Other studies will be rejected 
because the author failed to continue his inquiry to the stage of 
generalization. Such studies may be useful to another investi- 
gator and if other requirements are satisfied may be appro- 
priately labeled incomplete research ’’ Application of the 
third criterion will result in the rejection of studies in which the 
interpretation of the data has not been critical. Some of those 
thus rejected might be labeled “poor research,^’ meaning thereby 
that the cause of rejection is the failure to use appropriate 
techniques or to use skillfully the techmques employed. 

If a list of titles has been compiled without examining the 
writings, application of these criteria will usually result in tke 
rejection of a large proportion of the items Some of the refer- 
ences will be found to be essentially only discussions or descrip- 
tions of educational practice. A few are likely to be proposals 
for research. Still others will be found to be trivial. 

Selecting research pertinent to one’s problem. In addition 
to identifying the references that justify the label of research it is 
necessary to sort out those that are pertment to one’s problem 
or topic. A prerequisite for doing this is a precise definition of 
the problem or topic. A reference may be “interesting” but 
if it is not pertinent, failure to exclude it will add to the labor of 
evaluation. Hence, before a reference is read critically it should 

1 No definite number of cases can be prescribed as a prerequisite for generaliza- 
tion If the area to which the generalizations apply is restricted, a small number 
of cases may be suflSicient For example, certain generalizations relative to 
highly gifted children might be justified from an intensive study of ten typical 
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be examined to determine its pertinency to the problem or 
topic under consideration 

Evaluation of reported research. The purpose in evaluating a 
piece of educational research is to arrive at an estimate of the 
dependability of the author's conclusions. The best test of the 
dependability of a conclusion is to repeat the reported investiga- 
tion duplicating the conditions and techniques as nearly as 
possible. This procedure, however, is seldom feasible. Hence, it 
is usually necessary for a reviewer to resort to a critical examina- 
tion of the report and when possible to comparison with other 
studies of the same or closely related problems. 

The evaluation of a reported study involves essentially the 
same considerations as the determination of dependability by 
the author to which attention has been directed in the preceding 
chapters, especially VIII, IX, X, and XI. ^ The reviewer, how- 
ever, IS handicapped by reason of the fact that he has only the 
information that the author has reported. He should seek the 
definition of the problem If it is not given, he should endeavor 
to formulate a precise statement of it. Next, the general char- 
acter of the data and the techniques employed in collecting 
them should be noted. The most important phase of the evalua- 
tion is to ascertain the probable faults of the data ^ and the 
compatibility of the stated conclusions with the data when their 
faults and limitations are considered. 

There are few defi.mte techniques that may be employed for 
identifying the faults in the data of a research. In general, one 
who attempts to evaluate a piece of research must rely upon 
his acquaintance with educational data and the techmques of 
educational research. An experienced person who is critically 
minded will usually be able to estimate the faults of the data. 
The reviewer should also examine the handling of the data and 
note any inappropriate procedures or mcorrect interpretations 
of the findings. When possible, it is sometimes desirable to 
check calculations. 

1 For specific page references, see “dependability** m the index of this volume 

2 For a general exposition of these faults and their significance, see Chapter V 
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When the reviewer has two or more studies of the same 
problem or related problems, comparison of the reported con- 
clusions may yield an mdication of their dependability. Com- 
parisons, however, must be made with caution and the fact 
that two reported conclusions are not in agreement does not 
necessanly mean that one of the conclusions is lacking in de- 
pendability. It is possible that the disagreement may be ex- 
plained by differences in the population studied or by other 
phases of procedure which might not be apparent from a casual 
examination of the reports. 

The reviewer may also test reported conclusions by con- 
sidering their compatibility with general educational theory, 
with his experience in school affairs, and with logic or common 
sense. For example, the conclusion that drill in the fundamentals 
in arithmetic is effective in engendermg skill m calculation, may 
be regarded as dependable since it is in harmony with the gen- 
eral principle that practice increases skill. This conclusion is 
also compatible with common sense. In case a reported con- 
clusion does not appear reasonable or is not compatible with 
the reviewer's experience he is justified m being suspicious of 
the research. It, of course, does not follow that the reported 
conclusion is not dependable, and hence the apparent reason- 
ableness of a conclusion cannot be considered a final crite- 
rion. 

Taking notes on references. In taking notes on a reference, 
the reviewer should be guided by his problem Rereading will 
be necessary if items pertinent to his problem are omitted. 
There is also a waste of time if any large amount of irrelevant 
information is included in the notes. The reviewer should, 
therefore, have clearly m mind the items of iaformation to be 
looked for m each reference. It is frequently desirable to prepare 
a check list, or a data sheet to use as a guide in the reading of 
reports of educational research. For example, let us assume 
that the nature of the problem is such that most of the previous 
research may be expected to be of the experimental type Then a 
check hst containing the following items will be useful: 
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1 Reference 

2 Experimental factor or factors 

3 Dependent variable 

4 School subject 

5 Population, size of groups and their grade placement 

6 Representativeness of population 

7. Equivalence of experimental and control groups 

8. Control of non-experimental factors 

9 Duration of experiment 

10 Tests used for measuring dependent variable 

11. Differences in gams, or other measures of the relative achieve- 
ment of the groups 

12. Conclusions and generalizations 

13. Evaluation 

If many references are to be read, it is helpful to prepare 
mimeographed data sheets ^ These sheets can be divided by 
horizontal and vertical lines into spaces for writing notes rel- 
ative to the items to be noted These spaces may be labeled 
with appropriate headings, but it is simpler to number them to 
correspond with the items of the check list. It should not be 
inferred, however, that a data sheet or check list can be used 
in a purely routine fashion The reviewer will often encounter, 
in a report of research, information which is relevant to his 
problem, but which is not referred to by the items of the check 
list or the headings of the data sheet. When this occurs, notes 
pertaining to such information should be recorded. 

The reviewer should strive to be accurate in note-taking. 
The notes should be checked against the report of research 
before going on to another study Where quotations are in- 
cluded, this fact should be indicated by means of quotation 
marks. This precaution is one which may be instrumental in 
preventing the use of quoted material as one's own — uninten- 
tional plagiarism, but plagiarism nevertheless. In addition to 
indicating whether or not information is quoted, it is desirable 
to note the page on which the information is to be found. This 

^ The Alexander Universal Bibliography Card, obtainable from the Bureau of 
Publications, Teachers College, Columbia University, is a useful record form 
when only brief notes are being made 
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should always be done for direct quotations and usually for 
the more important notes that are not quotations 
Organizing the research relating to a problem or topic. After 
the researches relatmg to the problem or topic have been re- 
viewed they should be classified so as to bring together those 
pertaining to the same phase or that are similar m other re- 
spects. Usually the definition of the problem or topic will 
suggest captions for this classification. For example, in sum- 
marizing the research relatmg to the teaching of arithmetic 
the present writers employed the following major divisions: ^ 

I. Methods of teachmg and learning the fundamentals 
II. Methods of drill m the fundamentals 
III Methods of teaching pupils to solve verbal problems 

IV. Methods of diagnosis and remedial treatment 

V. Methods of teaching the readmg of arithmetical subject-matter 

VI. Motivation of learning activity in arithmetic 

Usually the particular outline evolved is not a matter of 
paramount importance Frequently, there may be two or more 
organizations that will be effective as a plan for summarizing 
the research, but when the number of studies is large the formu- 
lation of an outline, such as that just illustrated, is an essential 
step. The effectiveness of a summary depends upon its organ- 
ization. Hence, the problem and the several researches should 
be examined carefully in an effort to arrive at an effective plan 
of organization. 

How much description of the researches to include in a 
summary. A troublesome problem m preparing a summary is to 
determine how much description of the researches to include. 
No general rule can be stated. If the number of researches 
summarized is large and each is described in detail, a reader is 
likely to wish for a summary of the summary. The criticism 
implied in such a wish may be partially avoided by separating 
the description of the researches from their evaluation. But a 

1 Monroe, W S , and Engelhart, M D “A Critical Summary of Research 
Relating to the Teaching of Arithmetic,” UmversTiy of llhnois Bulletin^ VoL 
29, No 5, Bureau of Educational Research Bulletin, No 58 Urbana, Illinois: 
University of Illinois, 1931 115 pp 
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reviewer should attempt to restrict his descriptions to essential 
items. When the research is of a conventional type, the de- 
scription may be restricted to items that are indicative of the 
dependability of the findings. It is seldom desirable to report 
the findings in detail. Details of the treatment of the data 
should not be included unless they are essential for the evalua- 
tion. Sometimes the description of several similar researches 
may be combined and attention directed to points of similarity 
and points of difference. In many cases an abbreviated descrip- 
tion will be satisfactory and in general a reviewer should en- 
deavor to reduce the descriptions of the researches to a min- 
imum, but no essential items should be omitted. 

Evaluative statements important The reviewer^s evaluation 
should be included in the summary. Sometimes it may be 
sufficient to indicate the evaluation by such phrases as “care- 
fully conducted experiment,^^ “not convincing,” “open to rather 
serious criticism,” and “critically reported” In the case of 
major studies, it is desirable to point out the reasons for the 
evaluation. In general, a study should not be referred to in a 
summary without some indication of the reviewer's evaluation. 
Failure to conform to this rule is likely to contribute to the 
perpetuation of error and triviality. 

S3mthesis of dependable findings. A summary in which the 
several evaluated findings are reported m a serial order is not 
satisfying, especially when more than three or four studies 
relating to a particular problem are being considered. Such a 
summary suggests the notes taken as the studies were read and 
leaves to the reader the task of synthesizing the several findings 
into a composite conclusion. Unfortunately a large proportion 
of the available summaries are essentially nothing more than 
■classified annotated bibliographies. 

The description of the researches and the presentation of 
their findings should culminate in a synthesized statement of 
the conclusions justified by the researches as a group. This 
synthesis should be stated with precision and the depend- 
ability of the general conclusions should be made clear. Unless 
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the researches reviewed are so fragmentary and so unrelated 
that a synthesis is not feasible, a summary that does not even- 
tuate in a generalization may appropriately be called incom- 
plete. 


ILLUSTRATIVE SUMMARIES 

The following references are not cited as naodel summaries, but they are 

probably representative of the more scholarly writings of this type 

Brownell, W A ^ ^ The Techniques of Research Employed m Arithmetic,^' 
Twenty-Ninth Yearbook of the National Society for the Study of 
Education. Bloomington, Ilhnois Public School Pubhshmg Com- 
pany, 1930, pp 415-43 

Buswell, G T , and Judd, C. H ‘‘Summary of Educational Investigations 
Relating to Arithmetic,'' Supplementary Educational Monographs^ 
No 27 Chicago. University of Chicago, 1925. 212 pp. 

Corey, S M “The Present State of Ignoiance about Factors Effecting 
Teacher Success," Educational Administration and Supervision, 
18 481-90, October, 1932 

Crooks, A D “Marks and Marking Systems: A Digest," Journal of 
Educational Research, 27* 259-72, December, 1933. 

Engelhart, M D “Techniques Used m Securing Equivalent Groups,'^ 
Journal of Educational Research, 22. 103-09, September, 1930 

Fryer, Douglas The Measurement of Interests New York Henry Holt 
and Company, 1931 488 pp 

Hillia-Rd, G H “Probable Types of Difficulties Underlying Low Scores in 
Comprehension Tests," University of Iowa Studies, Studies in Educa-- 
tion, Vol 2, No 6 Iowa City University of Iowa, 1924, pp. 13-36. 

Hudelson, Earl “Class-Size Opimons, Evidence, and Policies in Sec- 
ondaiy Schools," North Central Association Quarterly, 4 196-208, 
September, 1929 

Knudsen, C W “Psychology and Methods m the High School and 
College — Social Studies," Review of Educational Research, 4. 462-65, 
December, 1934 

Lee, J M , and Symonds, P M “New-Type of Objective Tests. A Sum- 
mary of Recent Investigations," Journal of Educational Psychology, 
24 21-38, January, 1933 

Leonard, J P “Psychology and Methods in the High School and Col- 
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lege — English Language, Reading, and Literature,^’ Review of Educa-- 
tional Research, 4 449-61, December, 1934 

Lyman, R L ‘'Summary of Investigations Relating to Grammar, Lan- 
guage, and Composition ” Supplementary Educational Monographs, 
No 36 Chicago University of Chicago, 1929 302 pp 

Monroe, W S , and Engelhart, M D, “Stimulating Learning Activity,” 
University of Illinois Bulletin, Vol 28, No 1, Bureau of Educational 
Research Bulletin, No 51. Urbana: Umversity of Illinois, 1930, 
pp. 42-67. 

National Education Association, Research Division. “Contributions 
of Research to Curriculum Building,” Research Bulletin, Vol 3, 
Nos 4 and 5 Washington National Education Association, 1925, 
pp 125-61 

Powers, S R “Psychology and Methods in the High School and Col- 
lege — Science,” Review of Educational Research, 4. 473-78, December, 
1934 

Rock, R T., Jr “A Critical Study of Current Practices in Ability Group- 
ing,” Catholic University of America, Educational Research Bulletin, 
Vol 4, Nos 5 and 6 Washington Catholic Education Press, 1929. 
132 pp. 

Symonds, P M. “Methods of Investigation of Study Habits,” School and 
Society, 24. 145-52, July 31, 1926 

Yankby, J. V., and Anderson, PL “A Review of the Literature on the 
Factors Conditioning Teaching Success,” Educational Administration 
and Supervision, 19: 511-20, October, 1933. 



CHAPTER XIV 


PROGRESS TOWARD A SCIENCE OF EDUCATION 

The development of a scientific attitude in education. A 
significant phase of our progress toward a science of education ^ 
is found in the growth of a scientific attitude in education. 
When Rice reported his study of spelling at the meeting of the 
Department of Superintendence in 1897, the audience was dis- 
tinctly hostile to the implied thesis that the outcomes of spelling 
instruction could be measured by administering a test.^ The 

1 Some readers will doubtless raise the question of the possibility of a science 
of education This question is not new and it is interesting that many writers 
answered it in the aflarmative before 1900 In the following hst, Royce is the 
only one who does not give an affirmative answer 

Payne, J. “Science of Education,” Barnard's American Journal of Education^ 
26 465-68, 1876 

Allen, Jerome “Have We a Science of Education?” Education, 2 284-90, 
January, 1882 

Bam, A Education as a Science New York Appleton Company, 1884. 
453 pp 

Payne, W H Contributions to the Science of Eduaxtion New York American 
Book Company, 1886 358 pp 

Royce, J “Is there a Science of Education^” Educational Review, 1 15-25, 
January, 1891 

Scripture, E W. “Education as a Science,” Pedagogical Seminary, 2. 111-14, 
1892 

Findlay, J J “The Scope of a Science of Education,” Educational Review, 
14 236-47, October, 1897. 

For a vigorous defense of the thesis that a science of education is possible, see 
Phillips, D E “What Is Scientific?” Journal of Educational Psychology, 
23 299-308, April, 1932 In this article the suggestion is made that if mathema- 
ticians and workers in the field of the more exact sciences wish to take issue with 
the thesis that a science of education is possible, different levels of scientific 
endeavor might be recognized 

The interested reader will find the following reference helpful 

Demiashkevich, M J “The Science of Education,” Phi Delta Kappan, 
14' 184-86, April, 1932 

2 This study concerns the relation between the minutes per day devoted to 
the teaching of spelling and the spelling ability of the pupils See page 271. For 
the original report, see Rice, J. M “The Futihty of the Spelling Grind,” The 
Forum, 23 163-72, 409-19, April, June, 1897 These articles also appear as 
Chapters V and VI in Rice, J M Scientific Measurement in Education New 

453 
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reaction of this audience is indicative of the prevailing attitude 
at the close of the nineteenth century. Galton/ James, ^ 
Hall,® Cattell,^ and other psychologists were attempting to 
apply the methods of science, but their efforts had little direct 
influence upon the thinking of school administrators and 
teachers. 

During the first decade of the twentieth century, we find 
indications of a change m attitude. There was a marked in- 
crease in the number of experimental studies of the transfer of 
training.® In 1904 Superintendent Maxwell of New York City 
included in his annual report an age-grade study of the ele- 
mentary schools of that city.® The appearance of this report 
appears to have stimulated interest in the questions of retarda- 
tion and ehmmation. Within a period of less than ten years, a 
number of elaborate studies were made, of which Thorndike^s 
study, ^^The Elimination of Pupils from School,” ’’ in 1907 

York Hinds, Noble, and Eldrodge, 1912 Rice's account of the reception of his 
report is given on pages 17~18 of this reference 

^ Galton, Francis Ueredttary Genius An Inquiry into Its Laws and Conse-^ 
quences London The Macmillan Company, 1914 379 pp (First Edition, 1869 ) 
Galton, Francis Inquiries into Human Faculty and Its Development London 
The Macmillan Company, 1883 387 pp 
2 James, William. Principles of Psychology, Vol 1* New York. Henry Holt 
and Company, 1890, pp 666-68, footnote 
^ Hall, G S “The Contents of Children’s Minds on Entering School,” 
Pedagogical Seminary, 1 138-73, 1891. 

Hall, G S. Life and Confessions of a Psychologist New York D. Appleton 
and Company, 1923, 623 pp The student will find this autobiography an 
excellent source of information with respect to Hall’s work 
^Cattell, J. McK “Mental Tests and Measurements,” Mind, 16 373-80, 
July, 1890 

(Cattell, J McK , and Farrand, Livingston “Physical and Mental Measure- 
ments of the Students of Columbia University,” Psychological Review, 3 618-48, 
November, 1896 

^ Rugg, H O The Experimental Determination of Mental Discipline in School 
Studies Baltimore Warwick and York, Inc., 1916 132 pp 
Rugg gives an analytical summary of twenty-nme studies Three appeared be- 
fore 1900, six during the next five years, and twenty during the period 1906-1916 
It IS signifi.cant that only one study of transfer under school conditions was made 
before 1906 whereas nine such studies were made during the ten years following 
® Maxwell, W H Sixth Annual Report of the City Superintendent of Schools, 
New York, 1904, pp 42-49 

^ Thorndike, E. L “The Elimination of Pupils from School,” U, S Bureau of 
Education Bulletin, No. 4 Washington* Government Printing OflS.ee, 1907. 

pp 
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appears to have been the first. It was concerned chiefly with 
elimination, but some attention was given to retardation and 
acceleration. A couple of years later, 1909, Ayres pubhshed a 
somewhat more comprehensive investigation under the title 
Laggards %n Our Schools. ^ In 1911, Strayer published a 
study ^ that presented age-*grade data for a number of city 
school systems, colleges, and universities In the same year 
two other reports appeared, one ^ of which dealt chiefly with 
the progress of pupils, rather than with age-grade conditions, 
and the other ^ with retardation. Meyer’s study of teachers^ 
marks ® reported in 1908 was followed by investigations by 
Dearborn,® Starch and Elliott,^ and Kelly.® The First Yearbook 
of the National Society of College Teachers of Education, pub- 
hshed in 1911, was devoted to the subject Research within 
the Field of Education, Its Organization and Encourage- 
ment.” ® Bmet, who had been working with psychological 
tests for a number of years, devised and published in collabora- 

^ Ayres, L P Laggards 'in Our Schools New York Chanties Publication 
Committee, 1909 236 pp 

2 Strayer, G D “Age and Grade Census of Schools and Colleges,” U 8. 
Bureau of Education Bulletin, No 5 Washington. Government Printing Office, 
1911 144 pp. 

3 Keyes, C H “Progress through the Grades of City Schools,” Teachers 
College, Columbia University Contributions to Education, No. 42 New York* 
Bureau of Publications, Columbia University, 1911 79 pp 

^Blan, LB “A Special Study of the Incidence of Retardation,” Teachers 
College, Columbia University Contr'ibutions to Education, No 40 New York: 
Bureau of Publications, Columbia University, 1911. Ill pp 

5 Meyer, Max. “The Grading of Students,” Science, 28 243-52, 1908 
® Dearborn, W. F “The Relative Standing of Pupils in High School and in 
the University,” University of W'lsconsin Bulletin, No 312, 1909 44 pp 

^Starch, Daniel, and Elliott, E C “Rehability of Grading High School 
Work in English,” School Review, 20 442-57, September, 1912 

® Kelly, F J “Teachers' Marks,” Teachers College, Columbia University 
Contributions to Education, No 66 New York Bureau of Publications, Columbia 
University, 1914. 139 pp 

3 “ Research within the Field of Education, Its Orgamzation and Encourage- 
ment,” School Review Monographs, No 1. Chicago University of Chicago 
Press, 1911 71 pp 

The major portion of this volume consists of four papers 
Cubberley, E. P “Fundamental Administrative Problems ” 

Dearborn, W F “Experimental Education ” 

Monroe, Paul “ Cooperative Research in Education ” 

Thorndike, E L “Quantitative Investigations in Education with Special 
Reference to Cooperation within This Association ” 
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tion with Simon the Binet-Simon General Intelligence Scale ^ 
in 1905. This scale was introduced in the United States hy 
Goddard who published a standardization for American chil- 
dren in 1910 Other revisions were published in 1912 by Kuhl- 
mann and by Terman and his coworkers.^ 

The work of Rice served as a stimulus to Thorndike, Stone, 
Courtis, and others. Thorndike published the first book dealing 
directly with mental measurement ^ m 1904. Courtis, who had 
cooperated with Stone ^ by administering his tests, constructed 
a group of arithmetical tests designated as Series A which were 
made available for use in September, 1909. Thorndike's Hand- 
writing Scale was published in March, 1910, and the Hillegas 
English Composition Scale in 1912. 

In an address before the Harvard Teachers Association in 
March, 1912, Ayres commented on the change m attitude since 
Rice made his report at the meeting of the Department of 
Supeiintendencc in 1897. 

Last week, in the City of St Louis, that same association of school 
superintendents, again assembled in convention, devoted forty- 
eight addresses and discussions to tests and measurements of educa- 
tional efficiency The basal proposition underlying this entire mass 
of discussion was that the effectiveness of the school, the methods, 
and the teachers must be measured m terms of the results secured. 

This change represents no passing fad or temporary whim It 
is permanent, significant, and fundamental It means that a trans- 
formation has taken place m what we think as well as in what we 
do in education.® 

1 Bmet, A , and Simon, T “ M^thodes Nouvelles pour 1© Diagnostic du Niveau 
Intellectual des Anormaux,” U Annie Psychologique, 11* 191-244, 1905 

2 For a brief Instoncal account of intelligence testing, see Pintner, Rudolph. 
Intelligence Testing. New York Henry Holt and Company, 1923, Chapters I, 
H, and HI 

® Thorndike, E L An Introduction to the Theory of Mental and Social Measure-^ 
ment New York Teachers College, Columbia University, 1904. 277 pp (Re- 
vised edition, 1913 ) 

** Stone, C W “ Arithmetical Abihties and Some Factors Determining Them,” 
Teachers College, Columbia University Contributions to Education, No 10 
New York Bureau of Publications, Teachers College, Columbia University, 
1908. 101 pp 

® Ayres, L P ” Measuring Educational Processes through Educational Re- 
sults,” School Review, 20 300-01, May, 1912. 
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The final stand of those in opposition to the use of educational 
tests and the interpretation of the resulting average scores by 
comparison with established norms was made on the proposi- 
tion that such practices vrould result in a standardization of 
education that would be opposed to instructional efllciency. 
At the meeting of the Department of Superintendence in Febru- 
ary, 1915, the National Council of Education planned its 
program so as ^Ho give full hearing to those who are skeptical 
about the desirability of standardization, tests, measurements, 
and other exact forms of evaluating school work/^ The program 
of the first session of this organization bore the general title, 
“Standardization, Wise and Otherwise.’^ Opportunity was 
given to some of the outstanding opponents of measurement to 
present their case. Referring to this meeting ten years later, 
C. H. Judd said: 

There are many here who will recall the meetmg of the National 
Council in 1915 when the forces of conservatism gathered for a last 
stand and a battle was fought to determme whether measurement 
of mental and moral traits was to be recogmzed as permissible 
There can be no doubt as we look back on that council meeting 
that one of the revolutions in American education was accomplished 
by that discussion. Since that day, tests and measures have gone 
quietly on their way as conquerors should. Tests and measures are to 
be found m every progressive school m the land The victory of 1915 
slowly prepared durmg the preceding twenty years was decisive.^ 

The present attitude toward educational research is one of 
confidence in its possibilities. Superintendents look to educa- 
tional research for the construction of the curriculum, for the 
determination of the relative merits of various practices, for the 
evaluation of textbooks, and for the answers to many other 
questions in the field of education. Teachers expect educational 
research to tell them the relative merits of various methods and 
devices of teachmg Many schoolmen are apparently lookmg 
forward to the time when most, if not all questions which are 

1 Judd, C. H. “The Curriculum A Paramount Issue,” Addresses and Proceed^ 
mgs of the National Education Association^ Vol 63 Washington National 
Education Association, 1925, pp 806-07. 
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now confusing and hence are a constant source of irritation 
because they must be thought about, will have been answered 
by educational research and answered conclusively so that it 
will no longer be necessary for one’s peace of mind to be dis- 
turbed by trying to think about them 

Development of techniques for educational research. A 
second phase of our progress toward a science of education is 
the development of research techniques. Before the publication 
of Rugg’s text ^ in 1917, our principal sources in regard to 
statistical methods were Thorndike, Theory of Mental and Social 
Measurement, and Yule, An Introduction to the Theory of Statist- 
tics. The citations in Chapters IV, X, and XI to recent writings 
in this field indicate the contributions to statistical methods 
and the development of instruments for measuring human 
traits and abilities and of other research techniques is reflected 
in the discussion of other chapters of this volume. It is apparent 
that marked progress has been made in this phase of the develop- 
ment of a science of education 

Research activities iu the field of education. It is apparent 
that there has been much activity m the field of educational 
research during recent years, but the mention of certain facts 
will serve to make the picture of this phase of our progress 
toward a science of education more definite The report of the 
New York School Inquiry, 1911-1912, included the recom- 
mendation that a Bureau of Investigation and Appraisal” be 
established. As a result of this recommendation, a Division of 
Reference and Research was established in 1913. Similar de- 
partments were orgamzed in other cities: Baltimore, 1912; 
Rochester, N. Y., 1913; New Orleans, 1913; Boston, 1914; 
Kansas City, Missouri, 1914, Detroit, 1914; Schenectady, 
N. Y., 1914, Oakland, California, 1914. The establishment of 
departments of educational research m educational institutions 
was due largely to the suggestion of S. A. Courtis, who had 
developed the idea of comparative testing advocated by Rice. 

^ Rugg, H 0, Statishcal Methods Apphed to Education Boston Houghton 
Mifflin Company, 1917 410 pp 
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At first, Courtis directly solicited the cooperation of super- 
intendents and teachers in standardizing the tests he devised. 
As the interest in the testing movement grew, he foresaw the 
desirability of having centers in each state for distributing the 
tests, receiving and compilmg the scores obtained, and inquir- 
ing into conditions that appeared unusual. Such centers were 
established at the University of Oklahoma, 1913; Indiana 
University, 1914; Kansas State Normal School, Emporia, 
1914; University of Iowa, 1914; University of Minnesota, 1915. 
The first state bureau was the Division of Educational Tests 
and Measurements of the Wisconsin State Departments of 
Public Instruction, organized in 1916. The Iowa Child Welfare 
Research Station was authorized by the Iowa General Assembly 
in 1917, and the Bureau of Educational Research of the Uni- 
versity of Illinois was estabhshed by action of the Board of 
Trustees in 1918. 

Beginning about 1920 there is evident a definite effort to 
increase the production of educational research. For example, 
in 1921 the directors of the Commonwealth Fund appropriated 
$100,000 a year for a period of five years to be used in subsidizing 
investigations by individuals and orgamzations. The attitude 
of the committee administering the fund is indicated in the 
following paragraph from a statement issued by the secretary 
of the committee at the end of the first year: 

The Educational Research Committee believes that there should 
be many more appeals for subventions than have thus far come to it 
and that requests should be made by a much wider range of institu- 
tions. Indeed the conditions of the grant and the policy of the com- 
mittee are so flexible that any first-class project which can be clearly 
defined and budgeted is likely to receive favorable consideration. 
The committee meets three times a year, in the autumn, in the 
early spring, and in the early summer.^ 

One of the features of a volume published in 1926 was a 
plea for research by teachers.^ In connection with such appeals, 

1 Editorial Elementary School Journal, 22 404, February, 1922 

2 Buckingham, B R. Research for Teachers New York Silver, Burdett and 
Company, 1926 386 pp 
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teachers were told participation in experimentation and other 
types of educational research is relatively simple and requires 
little if any special training. For example, in the book just 
referred to, it is asserted that ''it is by no means necessary that 
you should set up formal expeiiments involving control groups 
in order to serve the cause of education as a research worker ^ 
A similar assertion was made in an editorial announcement in 
the English Journal for February, 1923, p. 138. The editor 
proposed an experiment to determine the relative merits of 
two instructional procedures. After explaining the plan of the 
inquiry and soliciting the cooperation of teachers, the writer 
stated: 

No technical training in the use of measurements will be necessary, 
and there will be no great additions to the teacher’s out-of-class 
labors Only the collection of a few samples of his own pupils’ com- 
positions and fairly close adherence to definite teaching policies in 
two classes — ^these will be the total burden of each co-operator 

An indication of the amount of research activity is afforded 
by the number of doctors’ degrees conferred in education. 
Table XVI shows the number by years from 1918-1932, From 
1918 to 1922 inclusive, the average number per year was 55, 
68 for 1922 being the largest. In 1923, the number rose to 94, 
and in 1926 to 181, and in 1932 to 337 Further evidence of 
the attainment of quantity production in educational research 
is afforded by the annual bibliographies of educational research 
published by the United States Ofl&ce of Education. For ex- 
ample, the bibliography published in 1928 and referring to 
research reported in 1926-1927 included 1540 titles. The bib- 
liography published m 1931 lists 4651 studies reported in the 
years 1929 and 1930. In commenting on this list of studies, an 
editorial m the January, 1932, issue of School Life asserts that 
on a conservative basis the 4651 studies listed represent a total 
expenditure of time and money of not less than ten million 
dollars 


1 Ihid., p 377, 
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Table XVL Number op Doctors^ 
Degrees in Education 


Year 

Number 

1918 

53 

1919 

50 

1920 

61 

1921 

43 

1922 

68 

1923 

93 

1924 

110 

1925 

137 

1926 

181 

1927 

189 

1928 

189 

1929 

218 

1930 

265 

1931 

308 

1932 

337 

Total 

2302 


Contributions to a science of education. The most significant 
measure of our progress toward a science of education is to be 
found in the accumulation of contributions to it. The attain- 
ment of “quantity production” in research activities suggests a 
large accumulation of contributions. It is apparent, however, 
that many of the investigations commonly designated as edu- 
cational research do not contribute to a science of education, 
at least, directly. A more conservative picture ^ of the con- 
tributions IS furnished by the bibliographies of the summaries 
published m the Review of Educational Research, 1931-1933. 
During this three-year period, the fifteen numbers of this jour- 
nal were planned to cover the entire field of educational re- 
search. Hence, the studies listed may be thought of as repre- 
senting the “cream” up to the time the several summaries 
were prepared. The general titles of the fifteen issues of the 

1 An interesting picture of the growth of educational research since 1890 has 
been contributed by Franke and Davis who classified 2837 articles from 13 
periodicals appearing during this period 

Franke, P R , and Davis, R A “ Changing Tendencies in Educational Re- 
search,’* Journal of Educational Research^ 23 133-45, February, 1931 
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journal and the numbers of items m the bibliographies are as 


follows. 

1931 

The Curriculum 303 

Teacher Peisonnel 458 

School Organization 300 

Special Methods m the Elementary- 
School . . 438 

Psychology in the School Subjects 884 

1932 

Special Methods on High School Level 276 
Finance and Business Administration 449 
Tests of Personality and Character 282 

Tests of Intelligence and Aptitude 450 

School Buildings, Groimds, Equipment, 
Apparatus, and Supphes 476 

1933 

Educational Tests and Uses 467 

Mental and Physical Development 433 

Pupil Personnel, Guidance, and Coun- 
seling 793 

Psychology of Learning, General 
Methods of Teaching, and Super- 
vision 457 

The Legal Basis of Education . 398 

Total . 6864 


There is some duplication among the several bibliographies, but 
the additions due to duplication are probably much less than 
the number of unpublished studies not listed and the studies 
omitted because they had been dealt with in published sum- 
maries. Hence, the above total is probably a conservative 
measure of the better educational research completed up to 
the time that these summaries were prepared. The totals for 
1934 and 1935, the first two years of the second cycle, are ma- 
teriailly greater than those for 1931 and 1932. This increase is 
due in part to a more thorough canvass of educational literature 
for reports of researches and to the inclusion of a larger number 
of unpublished studies, but it is likely that the production rate 
has increased. 

The status of the science of education. An estimate of the 
status of the science of education should recognize the nature 
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of its content. A science of education will include factual state- 
ments of the characteristics of defined populations of children 
such as age groups, gifted children, and the like, but these items 
will be incidental to statements of relationships and laws. A 
large section of the science of education will deal with the 
identification of the factors that affect human learning under 
school conditions,^ and with the relationship between these 
factors and pupil achievement. Educational tests and other 
mstruments for measuring human abilities and traits will re- 
ceive attention, but the treatment will not be merely a descrip- 
tion of their structure and directions for their application At 
present there is much ignorance concerning what we measure. 
In a science of education there will be more adequate definition 
of the various abilities and traits that we attempt to measure. 
It is probable that standardized umts of measurement will be 
defined 

It should be emphasized that statements of relationship, 
which will be prominent in a science of education, are generaliza- 
tions. A sample of a gas has the same characteristics as any 
other sample, and hence, the relationship between volume, 
pressure, and temperature determined from it are applicable 
to any mass of the gas. Human beings are characterized by 
individual differences and the relationships we seek in the field 
of education are for averages within given populations. An 
experimental study of the relative effect of two instructional 
procedures is designed to reveal the difference between the 
average effects for a particular population.^ The findmgs for 
another population may be different and hence in a science of 
education there will be statements of relationships for various 
typical populations. 

The picture of the status of the science of education suggested 

1 For an indication of what is involved, see pages 278-89 The interested 
reader should consult also Courtis, S A. “Factors Conditioning Growth, 
Papers of the M%ch%gan Academy of Science^ Arts and Letters, 10 349-67, 1928 

2 In theory the population might be an individual pupil with certain charac- 
teristics, but in such a case the findings would have such limited application 
that the value would be negligible 
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by the bibliographies in the first three volumes of the Review 
of Educational Research is too optimistic. Examination of the 
summaries reveals that many of the authors were not very 
critical in compiling their bibliographies Furthermore, many 
of the dependable findings can be regarded as only minor or in- 
cidental contributions to a science of education. We have ac- 
curate information concerning the distribution of teachers’ 
marks in certain schools, the scores made upon certain tests 
by certain groups of pupils under certain conditions, the salaries 
of certain groups of teachers, the vocabularies of certain text- 
books, the historical and geographical allusions in certain news- 
papers and periodicals, the eye-movements of certam readers, 
the age-grade status of certam school populations, the time 
allotted to the different studies in the elementary school cur- 
riculum, and the like. Such information is useful, but a com- 
pilation of it does not constitute a science of education It is 
possible, on the other hand, to point to a number of studies 
whose conclusions are of a different sort and rightfully deserve 
to be recognized as contributions to a science of education 
For example, research has given us valuable information regard- 
ing the relation of eye movements to the reading process. 
It has also contributed to the formulation of laws of learning. 
We now know a great deal concerning the predictive value of 
many measures. A critical survey of educational research 
would doubtless reveal several hundred studies that should 
be regarded as significant contributions to a science of educa- 
tion.^ 

The status of the science of education may be viewed also 
with reference to the character of the problems studied. Free- 
man 2 has pointed out that a large share of educational research 
has been devoted to disprovmg theories and hypotheses. It is. 


1 Since the nature of a science of education suggests that controlled experi- 
mentation IS a very fruitful type of research, Chapter IX should be reviewed in 
connection with the study of the present topic The concluding pages are es- 
pecially applicable 

2 Freeman, F N “The Contributions of Science to Education,” School and 
Society, 30 107-12, July 27. 1929 
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of course, important to have hypotheses subjected to experi- 
mental verification, but when research shows that a hypothesis 
is not tenable, only a negative contribution to the science of 
education has been made. Positive contributions are necessary 
for building up a science of education. 

In Chapter II attention was called to the fundamental prob- 
lems of education. Relatively few research workers are directing 
their time and energies to these problems. Survey investigations 
reporting the status of current practices and conditions make 
only incidental contributions to a science. In the field of edu- 
cational measurements many instruments have been constructed 
but relatively few attempts have been made to identify and 
define the ability or trait whose measurement is desired. We 
have many scales for measunng teaching efiSciency but there 
is no authoritative and precise defimtion of what is designated 
by this term. Factual items such as coefficients of correlation 
between certain measures are not useful in building up a science 
of education when they refer to undefined populations or to 
measures whose validity is unknown. Many experimental 
studies have been based upon populations that are not repre- 
sentative or are so small that generalization is hazardous. 

A third indication of the status of the science of education 
is the relatively small number of experimental findings that 
have been verified. Although critical evaluation of reported 
studies will usually lead to a fairly trustworthy estimate of the 
dependability of the findings, the ultimate test of a generaliza- 
tion is verification by a repetition of the mvestigation. In 
chemistry, physics, and other scientific fields much importance 
is attached to the corroboration of reported findings, but m the 
field of education investigators interested in the same problem 
have seldom employed sufficiently similar techniques to justify 
precise comparison of their results. As a rule, educational re- 
search workers have been much more interested m a new investi- 
gation or in appljnng an improved techmque than in repeating 
a study for the purpose of testing the dependability of reported 
findings. 
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The large investment m educational research during the past 
fifteen years has been largely devoted to isolated studies rather 
than to coordinated inquiries In other words, our research 
activities have not fitted into a general program. There have 
been a few relatively comprehensive programs such as the Ter- 
man Study of Gifted Children, the Educational Finance In- 
quiry, the Modern Foreign Language Investigation, and the 
Charters^ Study of Motion Pictures and Youth but for the most 
part the studies bearing upon a general problem or topic are so 
lacking in coordination that the findings can only be described 
as fragmentary and a synthesis of them is so lacking in com- 
pleteness that generalization is not justified.^ 

A fifth indication of the status of the science of education 
is the number of persons who have become definitely interested 
in educational research. The great volume of graduate theses 
in education and other research writings suggests that the 
number of such persons is very large The same indication is 
given by the number of elections to Phi Delta Kappa, an hon- 
orary society emphasizing interest in research in selecting its 
members, which now (1933) totals more than 15,000. The actual 
number of persons genuinely interested in educational research 
is, however, probably not very large. Brownell ^ states that 
out of a total of seventy unpublished masters’ theses included 
by Gray in his summary of research relating to the field of 
reading up to 1929, only portions of twelve were later pub- 
lished. This condition, which is probably representative of 
masters’ theses in education, indicates either that the other 
fifty-eight theses were not considered worthy of a published 
account or that the authors did not have sufficient interest in 
their work to prepare an account for publication. A number 
of persons have demonstrated a persistent interest in educa- 
tional research by continuing their inquiries and publishing 
their findings, but the total of such persons is small in com- 

1 Hartmann, GW Laisse* Faire versus Planning m Educational Ee- 
search,’* School and Society, 39 600-03, May 12, 1934 

2 Brownell, W. A “The Growth and Nature of Eesearch Interest m Arith- 
metic and Eeading,” Journal of Educational Research, 26 440, February, 1933 
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parison with the number whose names appear once, or at most 
twice in comprehensive bibhographies. 

A number of writers have expressed judgments relative to 
the status of the science of education. The following are perhaps 
typical. 

The chief service of contributions in the field of educational re- 
search up to the present time has undoubtedly been in pointing out 
problems and methods of approach. ^ 

We must use greater care to make certain that the conclusions 
we state in our reports foUow logically from the data presented. Too 
many reports state conclusions that are not fully supported by the 
research data included in them.® 

Nevertheless, I cannot evade the conviction that, relatively 
speaking, the published research m education is, on the whole, 
inferior in quality, and more especially inferior in ultimate signifi- 
cance, to the published research in other branches of scientific en- 
deavor Too many contributions seem essentially futile. After you 
read them, you feel like saying. “Well, suppose it is true; what of 

it? ^^3 

Writing in 1928, Courtis described the status of the science 
of education as that of “biased observation and uncritical 
acceptance of assumptions ^ In support of this evaluation he 
calls attention to our ignorance concerning what our instru- 
ments measure and asserts that “w’^e have not yet identified 
what it is that is measured by any test.'' 

After an extensive inquiry into the facilities for educational 
research in public school systems and an examination of a large 
number of reports of research, Zeigel states that “the bureaus 
of research m city systems are concerned primarily with the 
mere compilation of facts and statistics and consequently do 
not meet the prerequisite qualities of educational research pro- 


1 Theisen, W W “Recent Progress in Educational Research,” Journal of 
Educational Research^ 8 314, November, 1923 

“Trabue, M. R “Educational Research m 1925,” Journal of Educational 
Research, 13 344, May, 1926 

3 Whipple, G M “The Improvement of Educational Research,” School and 
Society, 26 251, August 27, 1927 

^Courtis, S A “Education — ^A Pseudo-Science,” Journal of Educational 

Research, 17 131, February, 1928. 
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mulgated by leaders m education ” ^ In another place he states 
^^the upshot of the facts presented is that the total extent and 
the quality of the research carried on within school systems is 
not highly commendable ^ 

These evaluations of educational research may seem to be 
unduly critical and it may be pointed out that several of them 
are not recent.^ But it must be admitted that critical contem- 
plation of the accomplishments of educational research does 
not lead to a very high estimate of the status of the science of 
education.^ The accomplishments are not negligible and in 
view of the short period during which there has been persistent 
effort to build up a science of education and the relatively slow 
development of the physical sciences/ we may with some 
justification point to them with pride But it seems a fair state- 
ment to say that we are just beginning to be aware of the nature 
of the task before us. Fundamental problems are beginning to 
receive the attention of an increasing number of workers with 
time and resources for research and the faith they exhibit is a 
significant indication 

The contributions of educational research to educational 
practice.® An appraisal of the research movement in education 
would be incomplete without directing attention to certain 
contributions to educational practice. The direct application 

1 Zeigel, W H , Jr. “Research m Secondary Schools/’ United States Depart- 
nient of the Interior, Office of Education, Bulletin No 17, National Survey of 
Secondary Education Monograph, No 15 Washington United States Govern- 
ment Printing Office, 1933, p 66 
71 

8 Several of the more recent writings on this topic have been as critical as the 
statements just quoted For example, see 

Symonds, P M “Common Faults in Graduate Research in Education,” 
Journal of Educational Research, 27 481-92, March, 1934 

4 The reader who has not studied the preceding chapters, especially VIII, IX, 
and X should read them in this connection 

® Courtis has made an interesting comparison of progress in the science of 
education with the development in other scientific fields Courtis, S A “The 
Construction of Measuring Instruments in the Field of Education,” Saentific 
Monthly, 21 260-90, September, 1925 

® For a more extended discussion, see Monroe, W S “ Service of Educational 
Research to School Administrators,” American School Board Journal, 70 37-39, 
122, 125, April, 1925 
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of research findings has resulted m a number of changes m our 
schools, some of them destined to have far reaching effects.^ 
Another effect of the research movement is evidenced by the 
utilization of statistical techniques, ^ educational tests, and other 
instruments m the study of practical school problems.^ A third 
contribution, although somewhat less tangible, is to be found 
in the analysis and definition of problems studied by research 
workers Many practical problems which appear relatively 
simple to the uninitiated have been revealed as highly complex.*^ 
Progress toward a science of education retarded by a false 
concept of research. Many graduate students and other persons 
who have been introduced to educational research during recent 
years have gained the impression that by carrying through cer- 
tain procedures, mostly routine in character, answers to educa- 
tional problems would be revealed. This is a false concept of 
educational research. It is true that m certain types of investi- 
gations there is much routine work, but even the routine pro- 
cedures of educational research are based upon assumptions that 
must receive attention in interpreting the findings Furthermore, 
educational data involve errors both of measurement and of 
validity which must receive attention The movement for 
quantity production served to ^^selT’ the idea that educational 
research is possible and desirable, but by trying to make it 
appear simple, the leaders supporting this movement contrib- 
uted to building up the false concept just noted. The require- 
ment of a thesis for a graduate degree and the custom of desig- 
nating as research the activity of satisfying this requirement, 
has doubtless contributed to the same end The engendering of 
this false concept of research m education has been contributed 


^ For a discussion of the ways in which research has modified educational 
practice, see Judd, C H “Educational Research and the American School 
Program,” Educational Record, 4 165--77, October, 1923. 

2 See Douglass, H R. “The Contribution of Statistical Method to Educa- 
tion,” School and Society, 35 815-24, June 18, 1932 

® For an account of the application of research techniques in the field of ad- 
ministration, see Strayer, G D “The Scientific Approach to the Problems of 
Educational Administration,” School and Society, 24 685-95, December 4, 1926 
4 For an elaboration of this point, see pages 289-93 and 424. 
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to by many persons occupying positions of distinction but who 
have been inadequately trained m statistical methods. Fre- 
quently, such persons have accepted uncritically findings that 
are not dependable and by giving publicity to them have caused 
their audiences to believe that educational research is more 
simple than it is. 

Perhaps more important is the effect created by clever and 
fluent speakers and writers who decorate their arguments with 
uncritical citations of findings which are in agreement with their 
opinions. Such persons take advantage of the attitude toward 
research that has been built up, and create the impression of 
being scientific when they are not. The point was made in 
Chapter XII that questions which ask what should be, cannot be 
answered by means of objective methods. Many of the educa- 
tional problems that now command our attention are of this 
type, and a writer or speaker, who gives the impression that his 
answer to such a question has been demonstrated by research, 
takes an unjustifiable advantage of his audience. Eventually 
audiences will detect such uncritical thinking. They may pos- 
sibly conclude that such writers and speakers are lacking in 
sincerity. It is not unlikely that disillusionment will result in 
building up an indifferent, if not unfriendly, attitude toward 
educational research. This danger, which in the judgment of 
the present writers is a very real one, would be minimized if 
writers and speakers were more critical in their use of reported 
findings and were willing to have their assertions appear as a 
product of their own thinking. 

The retarding influence of this false concept of educational 
research is difficult to estimate, but to one who has been reading 
reports of studies for more than twenty years it appears to have 
been considerable. However, the mistakes of the past are be- 
hind us, and it is now clear that there should be concerted effort 
to engender a more adequate understanding of what is involved 
in educational research. The genuine friends of educational 
research should recognize their responsibility. They should be 
critical not only in their own investigations but also in evaluat- 
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ing the work of others. They should seek an adequate under- 
standing of research techniques and refrain from giving un- 
critical publicity to findings whose dependability is uncertain. 
Graduate students and other amateurs should not be encouraged 
to undertake studies for which they do not have adequate 
training. Unfortunately, the treatment of research techniques 
in our better texts is in some cases inadequate, and in a few it is 
misleading or even erroneous This condition adds to the diffi- 
culty of attaining an adequate understanding, but persistent 
efforts on the part of the genuine friends of educational research 
will gradually overcome this handicap. 

Crucial needs of educational research. The need for identify- 
ing and formulating the assumptions basic to a science of educa- 
tion is fundamental Some attention has been given to this 
matter,^ but we do not as yet have a comprehensive formulation. 
As assumptions are formulated, they should be checked against 
each other and against available research findings and the 
assumptions that appear reasonable should then be organized 
to serve as a basis for research directed toward building up a 
science of education. 

It is frequently asserted that the techniques of educational 
research should be refined and probably the statement would be 
accepted by all competent persons, but it remains to inquire 
what refinements are needed The difficulties encountered in 
educational research have been systematically noted in the pre- 
ceding chapters and it will be sufficient to note here only the 
more important needs for improvement. 

There is need for a more adequate understanding of statistical 
methods. Many research workers employ statistical techniques 

1 For illustrations see 

Courtis, S A “The Factor Concept in Education,” School and Soaety, 
19 413-23, April 12, 1924 

Bode, B H “Where Does One Go for Fundamental Assumptions in Educa- 
tion?” Educational Administration and Supervision^ 14 361-70, September, 1928 

Freeman, F N “Psychology as the Source of Fundamental Assumptions in 
Education,” Educational Administration and Supervision^ 371—77, September, 
1928 

Hendrickson, Gordon, “ Some Assumptions Involved in Personality Measure- 
ment,” Journal of Experimented Education^ 2 243-49, March, 1934. 
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which they understand only imperfectly. This is especially true 
of correlation techniques and probable error formulae. The 
derivation of most statistical formulae is based upon assump- 
tions in regard to the data to which they are to be applied and 
other assumptions ^ are introduced in the use and interpretation 
of the statistics resulting from the application of such formulae. 
Without an understanding of the underlying assumptions, it is 
easy to misinterpret statistics and many instances of erroneous 
or misleading use of statistical techniques are to be found m our 
educational literature. 

A second need relates to the refinement of the measuring in- 
struments Considerable attention has been given to improving 
the reliability of measures, but increasing the reliability of 
measuring instruments is not sufficient. Attention must be 
given also to their validity. The fact that measures to which we 
give different labels appear to overlap to a marked degree, in- 
dicates a need for precise definition of the measures yielded by 
educational tests and other instruments 

The educational research worker seldom enjoys the privilege 
of working with perfect data Almost always they involve 
errors of measurement. Frequently they involve errors of valid- 
ity. Sometimes the data as a group do not conform to the 
assumptions upon which the statistical techniques are based 
and when generalization is attempted it is necessary to inquire 
concerning the representativeness of the sample used Hence, 
it is imperative that research workers ascertain the faults of the 
data with which they are working and make due allowance for 
these faults in interpreting their findings. We have made con- 
siderable progress in identifying the faults of data, but there is 
need for further inquiiy, especially relative to the causes and 
magnitude of systematic errors of measurement and of validity. 
The matter of sampling also should be studied. 

Refinement of controlled experimentation involves more pre- 
cise definition of the experimental factor as well as more effective 

1 These assumptions are in addition to those referred to in the first paragraph 
of this section. 
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control of non-experimental factors There is need also for main- 
taining a status of the non-experimental factors that is com- 
patible with sound educational practice. Longer periods of 
experimentation and more representative groups of pupils will 
add greatly to the dependability of the findings. 

In addition to effecting such refinements of the techmques of 
educational research, it is imperative from the point of view of 
building up a science of education that research workers direct 
their energies to the more fundamental problems. When we 
inquire into the problems that research workers have studied, it 
is found that relatively few of them have given attention to the 
fundamental problems of education noted in Chapter II. Much 
of the time and money devoted to educational research has been 
expended for survey investigations — ^many of them being trivial 
or having only local significance. Relatively little progress 
toward the science of education will be attained until research 
workers generally devote themselves to the basic problems of 
education when it is considered as a science. 

In directing their efforts to the fundamental problems of 
education, research workers should engage m more long time 
investigations. This is especially needed in experimental studies 
in which a segment of achievement is the dependent variable. 
Few experiments have been continued more than a few months. 
In some cases the experimental period has been hmited to a few 
weeks. It appears probable that in many cases such short time 
investigations do not reveal adequately some of the more im- 
portant effects of controlled changes in the expenmental factor. 
There is also need for more intensive studies. The Chicago 
Reading Studies illustrate what may be accomplished when a 
worker devotes himself to a limited field of investigation for a 
period of years. 

The challenges of educational research. The fundamental 
problems are complex and difficult, but they constitute chal- 
lenges to the student of education interested in research. Until 
we have dependable solutions of them, or at least critically 
tested hypotheses, we will not have a foundation for a real 
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science of education. Conventional test construction, fact- 
finding surveys, and curriculum analyses are, in contrast, sec- 
ondary and m many instances trivial The findings of such 
investigations may be helpful in planning educational practice 
but their contribution to a science of education is, in most cases, 
negligible, and in the few instances where a contribution is 
made, the results are so fragmentary that their value is doubt- 
ful. Hence, those who are interested in developing a science of 
education should endeavor to comprehend the fundamental 
problems and to contribute to their solution. 

A concluding statement. Careful study of the preceding pages 
of this text should have revealed to the reader the position of 
its authors with respect to the relation of a science of education 
to the theory and practice of education The fundamental 
problems of education were stated in Chapter II. The last of 
these problems was stated in terms of ^'what should be^' and, 
in Chapter XII, the importance of philosophical thinking m 
dealing with this problem was stressed. It should be noted, 
however, that the solution of the other fundamental problems 
will yield generalizations whose application to educational prac- 
tice involves consideration of “what should be.^^ Research of 
the type of multiple factor analysis will result in conclusions 
respecting elemental human abilities and traits. These con- 
clusions will be instrumental in the construction of better meas- 
uring instruments. More valid tests will aid in the attainment 
of more dependable experimental findings Improved knowledge 
respecting human abilities and traits and the factors influencing 
them will aid in the formulation of more defensible educational 
objectives, and, hence, in the construction of more adequate 
curricula. There is a danger that research workers engaged in 
the attempted solution of aspects of the fundamental problems 
will lose sight of the intimate relation between a science of 
education and the practice of education. The philosophical 
implications of the fundamental research of the “pure science^' 
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type cannot be left to philosophers alone. Too often educa- 
tional philosophers are erratic in their thinking about gen- 
erahzations derived from scientific research in education. Some 
of their criticisms have been weU founded, but in numerous 
instances, these philosophers have formulated theories about 
education without careful consideration of the faults of the 
research conclusions used as data in their thinking. In many 
cases, research workers in education have become lost in their 
enthusiasm for techniques, particularly the statistical ones. 
'^StatisticaF’ sigmficance has, for these individuals, become of 
greater importance than ‘^practicaT’ significance. 




APPENDIX 


STATISTICAL SYMBOLS 

Symbols are used as a means of convenience in designating statistics 
and m expressing formulae. Smce they are employed as mstruments 
of communication, uniformity of usage is highly desirable. Unfor- 
tunately, authors of statistical texts, and other writers employing 
statistical symbols, exhibit many variations in usage ^ This lack of 
uniformity increases the difficulty of reading reports of educational 
research. A number of symbols are hsted m this appendix for the con- 
vemence of persons who desire to systematize their usage. The hst 
has been compiled after considerable study of the matter and in most 
cases the symbols given are strongly supported by current practice 
In a few cases a symbol not widely used is given because it appears to 
represent a simple practice. No attempt has been made to supply 
symbols for all statistics 

Unfortunately our symbolism has developed without much attention 
to general principles. A few principles, however, are rather generally 
observed by the more authoritative writers 

Designation of data: Test scores and other measures. A group 
of data, such as the scores made on a test, chronological ages, school 
marks, intelligence quotients, and the like, commonly thought of as 
representing values of a variable, is usually designated by the symbol, 
X, A second variable may be designated by F, but most writers 
prefer to attach numerical subscripts to X to designate the several 
groups of data or variables. This practice has the advantage of being 
capable of extension to any number of variables. When only two sets 
of data are involved, the most common designations are Xi and X2, 
but there are certain advantages in using Xo to designate the variable 
that is considered criterion or dependent The independent variables 
would be represented by Xi, X2, X3 Xn 

When the raw data have been transformed so that they are expressed 
as deviations from their mean as the zero point, small letters are used 

1 West has reported the variation m usage in sixteen texts- He found different 
symbols used to represent the same thing and a number of statistics that were 
represented by two or more symbols 

West, P V “Need for Standardization of Symbols and Formulae in Educa- 
tional Statistics,” Journal of Experimental Education, 1 216-22, March, 1933. 
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instead of capitals. If they are further transformed so that they are 
expressed in terms of their standard deviation (cr) as a unit, z is used 
as the symbol When the data are expressed from an arbitrary zero 
point, such as an assumed mean, a prime (0 is attached to the symbol 
A bar ("■) above a symbol indicates a value estimated by means of a 
formula, usually a regression equation. An estimated value usually 
has the meaning of ^^most probable'^ value. 

A with subscripts as needed, is used to designate a variable that 
IS a component of X. When the variables are in terms of deviations, 
a is used for this purpose. Sometimes a, 6, c, d, etc., are used instead 
of a with subscripts. 

The number of measures (cases) in a group or population is com- 
monly designated by N Small n is used to represent the number of 
variables or sets of measures. Subscripts may be attached to N to 
designate different populations or groups When only two populations 
are involved, n is sometimes used to designate the smaller one. 

The results of calculations. The results of certain calculations 
such as those designed to obtain a central tendency or a measure of 
relationship are commonly designated by a single letter M for mean, 
r for Pearson product-moment coefficient of correlation, a for standard 
deviation, etc. There are a few exceptions of which the use of Md for 
median and PE for probable error are perhaps the most important 

The use of subscripts. It is usually desirable to connect a symbol 
designating the result of certain calculations with the group or groups 
of data from which it was obtained. This is accomphshed by means 
of subscripts. For example, Ms indicates the mean of the measures 
designated by Xs, T24 indicates the coefficient of correlation was 
obtained from the measures of variables, X2 and X4 or Xz and 0:4. In 
writing regression equations, coefficients of partial correlation, and 
certain other statistics, the system of subscripts is rather elaborate, 
but when it is understood, the statistics are easy to write and read. 
The use of sub-subscripts should be avoided when feasible Thus, we 
write r 23 instead of instead of D Mi-Mi When an addi- 

tional subscript is required, it may be written as a prefix. Thus, we 
write ooXo rather than Xo«, and 0^12 rather than 

New symbols. A writer will facilitate the reading of his publications 
by employing the symbols that are sanctioned by general usage. It 
IS unwise to follow a text or other source that does not represent good 
practices. When a writer encounters a need for new symbols, he should 
be guided by two general rules: (1) Avoid, if possible, the use of a let- 
ter or other symbol that has any considerable usage for another purpose 
(2) Select a simple symbol. In general, use a single letter, with an ap- 
propriate subscript if necessary. When two letters are used, omit periods. 
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A LIST OF RECOMMENDED SYMBOLS 

AA Achievement age, or synonymously accomplishment 
age or attamment age 

AD Average deviation, or mean deviation. 

AQ Achievement quotient. 

AR Achievement ratio 


&01 


feoi 23 . . . TO 
/3oi 23 ... TO 

c 

CA 

CR 

c 


D 

d 


doi 

don 

E 


Regression coefficient, involving r (Pearson) unless 
otherwise specified, of Xq (dependent) on Xi (inde- 


pendent) or xq on Xi, 




See page 324. 


Partial regression coefficient of Xo on Xi, all others 
of the n independent variables constant See page 325. 


Beta regression coefficient. See page 325. 

Constant in a regression equation. Used as subscript 
designates control group 

Chronological age. 


Critical ratio CR = See EC. 

Correction, diffierence between assumed mean and 
exact mean. As a subscnpt before di and ri 2 indicates 
that a correction for coarse groupmg has been em- 
ployed. 


Difference. The quantities subtracted may be mdi- 
cated by a subscript. For example, Dio-go designates 
the 10-90 percentile range. 

Deviation from the mean in terms of class or step 
intervals. The preferred symbol for this purpose is 
X, It should always be used when the deviation is m 
terms of scale umts. A prime (') is attached to indi- 
cate a deviation from an assumed or arbitrary origin. 

Coefficient of determination used in connection with 
path coefficients See page 393. 

A coefficient of detemnnation measuring the joint 
effect of variables xi and x^ on the variance of xo 


Efficiency of prediction. For specific meamngs as 
indicated by subscripts see pages 340 f . 
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e 

6var 

^$ya 

EA 

EC 

EQ 

f 

G 

IQ 

K 

k 

fel«23 . . . n 
^ 12*84 • , . « 

M 

MA 

Md 

MdD 

MD 

Mo 

m 

N 

n 


Designates the base of the Naperian system of log- 
arithms and equals 2 71828 . , Also used to designate 
the error in a measure 

Variable error of measurement. 

Systematic error. 

Educational age, expresses standing in a number 
of school subjects 

Experimental coefficient proposed by McCall. 
EC = SeeCR. 

2i lod x> 

Educational quotient. 

Frequency withm an interval. 

Gain. 

Intelligence quotient 

Width of class or step interval in scale units. 

A constant (Not used in regression equations ) 

Coefficient of alienation. A- = Vl — r^. When 
squared is the coefficient of non-determmation See 
r for subscripts and their meaning 

Multiple alienation coefficient When squared is the 
multiple coefficient of non-determmation. 

Partial alienation coefficient. 

Mean. 

Mental age. 

Median 

Median deviation 

Mean deviation. This symbol is not recommended. 
Use AD instead. 

Mode. Sometimes designated by Z. 

Number of variables when n is used m same formula. 
Total number of cases or observations. 

Number of variables. Also number of alternative 
responses to an item on a multiple response test 
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Used as a subscnpt to designate a criterion or de- 
pendent variable. 

With subscripts from 1 to 100 designates a percentile 
point For example, Pio designates the tenth per- 
centile, the point on the scale of a frequency distribu- 
tion below which 10 per cent of the measures fall 
and Pgo designates the mnetieth percentile, the pomt 
on the scale of a frequency distribution below which 
90 per cent of the measures fall 

Probable error, median deviation of the distribution 
when it IS one of errors. PE ~ .6745cr where cr repre- 
sents the standard error PE is sometimes used in- 
correctly for MdD where the distribution is not one 
of errors. 

For the use of this s 5 nnbol with subscripts to desig- 
nate the probable error of particular quantities see cr 
(standard error) 

Probability of success, or of per cent of cases in a 
given category, p = 1 — g. 

Symbol for path coefficient. See page 393. 

Quartile deviation. Sometimes called semi-interquar- 

XI r\ Qs “■ Qi 

tile range. Q = 

First (lower) quartile point. Qi = P 2 fi 

Third (upper) quartile point. Qz — Pn* 

Probability of failure, g == 1 — p. 

Multiple correlation coefficient When squared it is 
the coefficient of multiple determmation. 

Pearson product-moment coefficient of correlation m a 
theoretical range of talent. Usually this range of 
talent is larger than that for ri 2 . P is used as a symbol 
for the coefficient of ‘'rank correlation,” but this 
statistic IS seldom calculated R is also used to desig- 
nate the "number of right responses.” 

Pearson product-moment coefficient of correlation. 
Subscripts are used to indicate the two sets of paired 
measures or variables whose correlation is being 
expressed The symbols Xi, Xs, etc., designating the 
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variables may be used as subscripts but usually only 
the subscripts of these symbols are attached to /. 
Usually no significance is attached to the order in 
which the two subscripts are written, but when it is 
desired to identify the dependent variable (F«ordmate 
in correlation table) the first position may be given 
to the subscript designating this variable. A few of 
the more common cases of special subscripts are given 
below. 

rji Coefficient of reliability, the subscripts designating 
two measures of the same thing. We also write r 2 ij. 
See pages 199 f. 

j Coefficient of correlation between the two halves of a 

2 Ji test. See page 201. 

rico Correlation between obtained and theoretical true 
measures This symbol assumes that Xi represents 
the obtained measures. If another symbol is used to 
designate these measures the subscript would 
be changed accordingly When Xi designates test 
scores is read “index of reliability 

roo« Coefficient of correlation corrected for attenuation. 
See page 151. 

^" 12 . 34 . , . n Coefficient of partial correlation. See pages 377 f, 

S Sometimes used to designate summation. See S. 

SD Standard deviation, but this symbol is seldom used. 
See O'. 

s Used as subscript to indicate errors due to random 
sampling 

sys Used as subscript to designate systematic error. 

Sk Skewness. 

^1234 Symbol for tetrad difference ^1234 == 'ri 2 r 34 — Tur 2 i, 
See page 402. 

V Coefficient of variability, a measure of relative dis- 
persion. Often written C, of F. 

var Used as subscript to designate variable error 

X and Y Are used to designate raw or observed measures in 
two series of measures, or Xi and X 2 may be used. 
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If there are several series of raw measures these may 
be designated Zi, Z2, Z3, . Z„ Small letters 
x, y, rci, xz and so on represent measures expressed 
as deviations from the means of the corresponding 
raw measures. (See mtroductory statement.) 

Zoo Symbol for true measures corresponding to the raw 
or obtained measures designated by Z. If there are 
two or more Z^s such as Zo, Zi, Z2, . • • , the corre- 
sponding true measures may be designated by wZo, 
00 Zi, 00Z2 ... as a means of avoiding confusion. 

J'o Symbol for predicted or estimated measure of var- 
iable Zo. Similarly, Zi, Z2, Z3, etc., represent pre- 
dicted or estimated measures of variables Zi, Z2, Z3, 
etc , where these are considered as dependent var- 
iables. 


V 

s 


Estimated true score, the variable is Zi, when it is 
not, the symbols <oZo, C0Z2, etc , may be employed. 
See page 152 

Standard measure, 1 e , a measure expressed from the 
mean of the distribution as a zero point and 
m terms of the standard deviation (cr) as a unit. 

5= — Ml. The symbol z is also used as an 

ordinate of the normal probability curve having 
unit area and umt standard deviation. 

Eta, the ratio of correlation; measure of curvilinear 
correlation. See page 98 . 

Summation, the sum of. Occasionally S is used for 
this purpose but except where special designation 
IS necessary, Z is preferable as a symbol. S is some- 
times used to indicate standard deviation of theoret- 
ical or large range. 

The sum of the measures for individuals, 1 to Z in- 
clusive 


Sigma, the standard deviations of a distribution. 
cr = See pages 75 f . When the distribution is 
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one of errors, <r is called the standard error. See pages 
104 f . and 133 The particular distribution is indicated 
by subscripts. A few special cases are given. 

(Tm Standard error of a mean, usually interpreted as the 
standard error due to random sampling. If the data 
are fallible, the effect of variable errors of measure- 
ment is also included See pages 156-57. When there is 
any possibility for confusion, sub-subscripts should 
be used as in the three following cases 

Standard error of a mean due to random sampling 
and to variable errors of measurement 

Standard error of a mean due to variable errors of 
measurement. 

<tmb Standard error of a mean due to random sampling 

cfMd Standard error of a median. 

Cr Standard error of a coefficient of correlation. 

(td Standard error of a difference When desired the 
subscript D may be replaced by the difference it 
represents Hence ctmi - Mz would indicate the stand- 
ard error of the difference When necessary 

to avoid confusion imiM^ 5^ 0 or tmiM^ = 0 may be 
added to the subscript to make clear the formula 
used. See page 105. 

cToo True standard deviation or standard deviation of 
true measures cToo = cri-v/ru. When desirable to 
indicate the group of data or variable the symbol « 
may be written as a pre-subscript. 

0 * 1.2 Standard error of estimate — standard deviation of the 

differences between Xi and the estimates of Xi(Xi) 
made from X 2 by simple regression equation, 
— 0*1 

Xi = ri 2 “ X 2 + C. We might write axi - Xi but 

this would not be a convenient symbolism. Hence 
we use (Ti 2 in which 2 indicates the basis of estimate 
and 1 the basis of comparison. 0 * 12 = 0*1 Vl — r 12 
In case the variables are Xo and Xi the standard error 
of estimate would be expressed as cro.i. See pages 334 f 
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a'2,1 Standard error of estimate of X2, the predictions being 
made fro m Xi b y means of regression equation. 
0-2.1 = 0'2Vl ~ tfz 

cToo.i Standard error of estimate where a regressed, or 
estimated true score, Xoj is taken as evidence of the 
true score X«.. (Zi , X^, Z oo represent the same 
function) cr^i = cri^ru — The symbol is also 
used where Zo is taken as evidence of ooZo, and the 
formula for the standard error of estimate is 
o-oVroo — rh’ See page 336 . 

«ri.« Standard error of measurement where Zi is taken as 
evidence of Z«, Literally the standard error of 
measurement is the standard de-viation of the differ- 
ence Zi ~ Zoo (variable error of measurement). 
0-1 00 = o-iVl — rii See pages 132 - 33 . 

o-Q 1234 . . . n Standard error of estimate of Zo computed from 
regression equation mvolving independent variables 
Zi, Z2, Z3, . n Same symbol is used when the 
variables are expressed from their respective means 
as zero points. 

< 7 * 00.1234 ... 71 Standard error of estimate where Zo is taken as 
evidence of true criterion measure 00 Zo, multiple 
regression equation being used with Zi, Z2, Z3 . . . Zn 
as independent variables 

00 Infinite symbol, as a subscript indicates a true meas- 
ure of a variable, 1 e , the mean of an infinite number 
of measurements. 

CO Omega, as a subscript, a true measure of a second 
variable. 
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Beta coefficients, 325, 332, 393; in- 
terpretation, 394. 

Bibliographies, 440 f. 


Bi-serial r, 101, 184, 236. 

Blakeman test, 102, 154. 

Calculation, 62 f ; economy in, 92 f. 

Causal variables, 367 f., contribu- 
tion, 371, 376 

Cause and effect relationships, 
366 f. 

Causes, elemental, 398; identifica- 
tion, 369 f 

Chance prediction, 341. 

Chapman-Sims Socio-Economic 
Scale, 54 

Checklist, 48. 

Chi-Square Test, 79. 

Classification of data, 225. 

Coding, 228 f 

Coefficient of alienation, 335 

Coefficient of contingency, 101 

Coefficient of correlation (product- 
moment), 86 f.; assumptions, 
101 f ; economy in calculating, 
92 f ; effect of factor of hetero- 
geneity, 375, error due to group- 
mg, 97; estimate for other popu- 
lations, 110, interpretation, 113 f , 
350, 373 f , 384 f., machine calcu- 
lation, 96 

Coefficient of correlation other than 
product-moment. See Correla- 
tion. 

Coefficient of determination, mter- 
pretation, 397 f 

Coefficient of direct determmation, 
393 

Coefficient of joint determmation, 
393. 

Coefficient of multiple correlation, 
337, 394, shrinkage, 361 f 

Coefficient of multiple determina- 
tion, 394 
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Coefficient of reliability, 110, 199, 
205 f ; interpretation, 385 
Coefficient of validity, 148; inter- 
pretation, 385 f. 

Coefficient of variability, 113, 234 
Collecting data, basic techniques, 
30. 

Common factor, 367, 403 
Comparable measures, 82 f. 
Comparative survey, 271 
Comparing frequency distributions, 
235 f 

Component variables, 381, 382, 391. 
Composite score, 195 
Computational aids, 64 
Concomitant variation, 366 
Consensus of opinion, 256, 423. 
Constant error, 124 See also Sys- 
tematic error 

Continuous scale of measurement, 
61 

Controlled experiment, 272 
Copying data from records and pub- 
lished souiccs, 30 f 
Conection for guessing, 191 
Correlation, 85 f , bi-serial r, 101, 
184, 236, contingency, 101, inter- 
pretation, 350, 371; multiple, 337; 
non-linear, 98, 252, part, 383; 
partial, 377 f ; rank, 101 , semi- 
partial, 383; tetrachonc, 97, 101; 
uses, 116. See also Coefficient of 
correlation (product moment) 
Correlation analysis, 366 f ; applica- 
tions, 405; illustrations, 406 f 
Correlation ratio, 98, 252 
Correlation table, 86 f 
Criteria of validity, 178 f , 207-8 
Critical ratio, 106, 249, 309 
Critical score, 349 

Curriculum construction, 419 f ; il- 
lustrations, 426 

Data, classification, 225 f , collec- 
tion, 28 f ; labels of, 121, 144; 
meaning of, 61. 

Data faults, 122 f ; effect of, 149 f. 
Decile points, 74 


Defining a problem, 23; experi- 
mental, 289 f ; survey, 215 f 
Dependability, 121, 149 f , 157 f ; 
coefficients of determination, 398; 
efficiency of prediction, 355, ex- 
perimental differences, 306 f , 
308 f ; generalization of coeffi- 
cients of reliability, 206, historical 
research, 167; norms, 248; partial 
correlation, 380 f , philosophy and 
science, 418, survey investiga- 
tions, 244 f , 248 f , 251 
Dependent variable, 275 f , 289 f , 
301, 324, 389 f 
Derived measures, 220 f 
Derived scales of measurement, 
193. 

Differential prediction, 332 f 
Difficulty, stability of, 187 
Difficulty of test exercises, 180, 186; 
relation to discriminating power, 
181 

Dimensions of ability, 188 
Direct measurement, 172. 

Disci etc measures, 61 
Discriminating power of test exer- 
cises, 182 f 

Distribution, fiequency, 66 f , nor- 
mal, 79 

Distributions, transformation to 
noimal, 84 

Doctors’ degrees, lists of, 441; num- 
ber in education, 460 
Doolittle method, 332. 

Educational problems, 12 f , 412 f ; 
definition, 23 f , experimental, 
270; fundamental, 16 f , historical, 
159 f ; measurement, 175, 176 f ; 
prediction, 323, purposes, 412 f ; 
survey, 215 f. 

Educational research, characteris- 
tics, 7 f , criteria of, 443 f.; crucial 
needs, 471 f ; definition, 1, false 
concept, 469 f ; history of, 453 f ; 
illustrations, 2 f , quantity pro- 
duction, 459 f , types of problems, 
15 f. 
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Efficiency of prediction, 336, 340; 

causes of errors m, 358 f 
Elemental causes, 398, 399 f 
Equivalent groups, 295 f 
Error due to grouping m broad cate- 
gories, 97 

Errors, 123 See also Systematic 
errors, Variable errors 
Errors of estimate, 334 f , 337 f. 
Errors of measurement, 130 f 
Errors of sampling, 103 f , 156, 248 f 
Errors of validity, 129, 144 f , 253, 
307. 

Essay examination, 303, 387 
Estimated true measures, 152 
Evaluation and synthesis, 437 f 
Experimental coefficient, 308. See 
also Ciitical ratio 
Experimental difference, 305, 306 f 
Experimental factor, 274, 276, 290 f 
Experimental problems, 270 
Experimental research, 274 f ; ac- 
complishments, 310 f ; details of 
piocedure, 288 f , future of, 313 f , 
illustrations, 314 f 
Extra-school factors, 287 
Eye-movements, 3 

Factor, 367 
Factor analysis, 401 f 
Factor loadings, 401, 405 
Factor patterns, 381, 382, 401 f 
Factors contiibuting to pupil 
achievement, 278 f. 

False concept of educational re- 
search, 469 f. 

First quartile point, 73 
Forecasting, 238 f , 323 f 
Frequency distribution, 66 f ; con- 
version into per cents, 232, nor- 
mal, 79 

Frequency distributions, compari- 
son, 235. 

Fundamental problems, 16 f. 

Gams, measurement of, 303 
General intelligence, 279. 

General school factors, 284 f. 


Generahzation, 155; experimental, 
308 f , historical, 167 ; prediction, 
354 f ; rehabihty, 205 f., survey, 
248 f 

Geometric mean, 237 f. 

Grade scores, 194. 

Graduate theses, problems for, 19 f. 
Graphic methods, 239, 327 f. 
Graphic rating scale, 52. 

Group factors, 403, 404 
Guessing, correction for, 191. 

Halo effect, 142. 

Heterogeneity, 375 f 
Historical data, vahdity, 164. 
Historical research, 159 f , illustra- 
tions, 168 f 

Hollerith Machine, 65, 185 
Homoscedasticity, 102. 

Independent variable, 274, 275 f 
Index numbers, 221 f 
Index of rehabihty, 204 
Indirect measurement, 129, 172 f. 
Intelligence quotients, 85 
Interpretation of statistics, 112 f ; 
beta coefficients, 394, coefficient 
of correlation, 113 f , 350, 373 f , 

384 f ; coefficient of determma- 
tion, 397 f ; coefficient of reliabil- 
ity, 385, coefficient of validity, 

385 f , efficiency of prediction, 
340 f ; experimental difference, 
306 f , probable error, 105 f.; 
probable error of measurement, 
134, regression coefficients, 389 f , 
394, reliability coefficient, 385; 
survey findings, 244 f., variance 
ratio, 372 f , 384 f See also De- 
pendability, Generalization. 

Interviewing, 36 f. 

Kurtosis, 78. 

Law of the single variable, 188 
Linear relationship, 98 

Machine methods, 65, 94, 96, 185, 
229, 329 

Magnitude of a variable, 372. 
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Man-to-man rating scale, 51 f 
Matched groups, probable error of 
diffeience, 309 
Matching, 295 f 
Mean, 66, calculation, 68 f 
Mean, geometric, 237 f. 

Mean and the median, relative mer- 
its, 112 

Measurement, basic problems, 
175 f., meaning of, 172 f 
Measures, approximate, 61 
Measuring instruments, construc- 
tion, 171 f 

Mechamcal recording, 3, 45 f. 
Median, calculation, 72 f. 

Median deviation, 77 
Method of agreement, 370 f 
Mode, 74 

Moving averages, 235, 329 
Multiple correlation, definition and 
formula, 337; shunkage of co- 
efficient, 361 f 

Needed research, 25 f. 

Net achievement, 386. 

Newark phonics experiment, 4 
Non-experimental factors, 292, 298. 
Non-linear correlation, 98 
Non-representativeness of data, 155 
Normal correlation surface, 103. 
Normal curve, 79 f 
Normal distributions, identification, 
79; properties, 80 f 
Normal equations, 331 f , 397 
Norms, dependability, 248 

Objective data, 29 
Objective tests, 302 f 
Observing, 47 f 

Paired measures, 85 f. 

Part correlation, 383 
Partial correlation, 330, 377 f ; or- 
der of coeflficients, 378; precise 
defiimtion, 379 f 

Partial correlation as a means of 
idmtifying cause and effect re- 
lationships, 388. 


Partial legiession, 331 
Paitial standard deviation, 330 f. 
Path coefficients, 393, 395 f 
Pearson's Chi-Squaie Test, 79 
Per cents, averaging, 233 
Percentile points, 74 
Percentile ranks, 84, 241. 

Percentile scores, 195 
Performance test, 174, construction, 
176 f 

Philosophical method, 414 f 
Photographic recording, 3 
Predicting success in the first year 
of college, 351 f 

Predicting teaching success, 353 f 
Prediction, 323 f , differential, 332 f ; 

graphical methods, 327 f 
Prediction formulae, 323 
Predictions, accuracy of, 333 f 
Predictive efficiency, 337 f 
Probability integral, 80 f. 

Probable enor, 77, 82, 104, 107 f , 
155 f , interpretation, 1 05 f , when 
calculated, 108 f. 

Probable error of measurement, 132, 
133, 150, 204, 206, interpretation, 
134 

Problems, definition, 23 f., 289 f ; 
educational, 1 2 f , 412 f ; experi- 
mental, 270; fundamental, 16 f ; 
historical, 159 f ; measurement, 
175 f.; prediction, 323; purposes, 
412 f , survey, 215 f 
Problems for graduate theses, 19 f. 
Product-moment coefficient of cor- 
relation, 86 f 

Psychological questionnaires, 175, 
195 f., 203 
Pupil traits, 278 f. 

Pure guesses, 341 

Quality scales, 192. 

Quantity production of educational 
research, 459 f 
Quartile deviation, 75. 

Quartile points, 73 
Questionnaire, 40 f.; systematic er- 
rors in, 143 f. 
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Random prediction, 341. 

Random sampling, 57 f , 218, 249 i , 
297, 309; effect of, 103 f 
Range, 74. 

Ranks, 223 f , percentile, 224, tians- 
formation into amount scores, 
224 

Rating, 51 f., 175, halo effect, 
142 

Ratios, 85, 220 f , 304; spurious cor- 
relation of, 388 
Recording activities, 46 f 
Regressed measures, 152 
Regression coefficients, 325; in- 
terpretation, 389 f , 394 
Regression equation, 99, 330 f , 324, 
calculation of, 330 f. 

Regression equations and factor 
analysis, 389 f 

Relationship, 85 f , 270 f , 366 f 
Reliability, 107, 177, 198, 208 f 
Reliability coefficient, 110, 199 f, 
205 f , interpretation, 385 
Reliability of lengthened test, 209 
Representativeness, 217 f See also 
Geneialization 
Rotation method, 297, 299 f 

Sampling,’ 57 f , 218. See also Ran- 
dom sampling 

Science of education, 453 f ; con- 
tributions, 461 f ; fundamental 
problems, 16 f , status, 462 f 
Scientific attitude, 453 f 
Score cards, 53. 

Self-ratmg, 143 
Semi-partial correlation, 383 f 
Sheppard's correction, 97, 102, 153 f 
Sigma, 75 

Sigma index score, 84 
Significant figures, 62 f. 

Skewness, 78 

Spearman-Brown formula, 201 f , 
209. 

Spearman's ^^g" factor, 398, 403. 
Spiral test, 176 
Spurious correlation, 200, 388 
Standard deviation, 75. 


Standard enor, 82, 104 See also 
Probable error. 

Standard error of estimate, 333, in- 
terpretation, 337 f. 

Standard populations, 376 
Standard score regression equation, 
325. 

Statistical methods, elementary, 
61 f., texts, 118 f. 

Statistical significance, coefficient of 
correlation, 116 f , difference, 107, 
249, 253, 308 f 
Statistical symbols, 477 f 
Statistics, interpretation See De- 
pendability, Generalization, In- 
terpretation of statistics 
Stenographic recordmg, 47. 
Subjective data, 29. 

Subjectivity, 131. 

Summaries, 438 f , 451 f 
Survey findings, interpretation, 
244 f 

Surveys, 216 f ; leporting, 242; il- 
lustrations, 259 f 
Symbols, statistical, 477 f 
Systematic errois of measurement, 
135, 246. 

Systematic errors of validity, 146 f , 
149, 246 f. 

Tables, 64 

Tabulating machines, 229-31. 
Tabulating survey data, 227 f 
Taking notes on references, 447 f 
Teacher factors, 281 f. 

Teaching success, 353 f 
Test construction, 171 f. 

Test exercises, 178 f 
Tetrachoric correlation, 97, 101. 
Tetrad equation, 402 
Textbook analysis, 33 f , 57. 

Time series, 329 
Transforming data, 82 f , 223 f 
Trends, 237 
True-false tests, 145 f. 

True scores, 204 
T-scores, 83, 193 
Types of data, 28 f. 
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Uncorrelated components, 381. 
Uniform test, 176 
Units of measurement, 189, 193 
Universe, 103, 108. 

Validity, 129 f , 144 f , 177, 187, 198, 
207 f , coefficient, 148, criteria, 
178 f 

Validity of historical data, 164 
Validity of individual test exercises, 
181 f. 

Values, problems of, 412 f 
Variability, 74 f , 113; coefficient, 
113, 234. 

Variability of test performance, 
130 f. 


Variable, 273; magnitude of, 372 
See aho Dependent variable. In- 
dependent vaiiable 

Variable errors, 125, 149 f 

Vanabie errors of measurement, 
130 f , description, 132 f , 199 f , 
245 

Vanabie errors of validity, 144 f, 
148, 207 

Variance, 372 

Variance ratio, 372 f , 384 f. 

Vocabulary studies, 57 f. 

Weighting, 189 f , 405 

Work units, 189 

Zero-order coefficients, 378. 




